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Applicant: 
Group Art Unit: 
Examiner: 
Title: 
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January 18, 2002 

Torben Bach Pedersen 

Unknown 

Unknown 

METHOD AND SYSTEMS FOR MAKING OLAP 
HIERARCHIES SUMMERISABLE 



Attorney Docket: 045900-0007 1 8 



Box Non-Fee Amendment 
Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

PRELIMINARY AMENDMENT 

Sir: 

Applicant submits this Preliminary Amendment to the application filed herewith, 
for consideration prior to the calculation of the filing fee, as follow: 

This application is the National Phase under 35 USC §371 of PCT International 
Application No. PCT/DKOO/00354 which has an International filing date of June 30, 
2000, which designated the United States of America the entire contents of which are 
hereby incorporated by reference. 
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IN THE CLAIMS 

Please amend the claims in accordance with the following rewritten claims in 
clean form. Applicant includes herewith an Attachment for Claim Amendments showing 
a marked up version of each amended claim. 

Please amend the following claims. 

3. A method according to claim 1 , wherein the step of creating new 
dimensional values and modifying the mapping comprises the steps of 

examine whether the dimension is covering, and in case it is, 
executing a make-onto procedure for making the dimension onto, thereby at least 
partly making an non-onto dimension aggregation normalised. 

4. A method according to claim 1 , . wherein the step of creating new 
dimensional values and modifying the mapping comprises the step of 

executing a make-covering procedure for making the dimension covering, 
thereby at least partly making a non-covering dimension aggregation normalised. 

5. A method according to claim 1 , wherein the make-strict procedure 
comprises the steps of, starting from the bottom category and successively proceeding 
towards the top category, 

identifying combinations of dimensional values of the same category for each of 
which combination at least one dimension value of a category below said category is 
linked to each of the dimension values of the combination, 
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creating offe or more new dimensional values each representing one of the 
identified combinations of dimensional values and creating links from the new 
dimensional values to dimensional values of above categories in accordance with 
existing links from each of the dimensional values represented by the new dimensional 
value, and 

identifying dimension values being linked to identified combinations of 
dimensional values of the same category and replacing the links with links to new 
dimensional values representing said combinations of dimensional values. 

6. A method according to claim 2, wherein the make-strict procedure 
comprises the successive steps of 

(i) setting the bottom category of the dimension as the child category, 

(ii) for each category being a direct predecessor of the child category of which 
category at least one dimension value of the child category is linked to a dimension 
value of, setting said category as the parent category and performing the steps of: 

(iia) ending the make-strict procedure for the parent category in case the 
parent category is the top category of the dimension, 

(iib) ending the make-strict procedure for the parent category in case no 
dimension value of the parent category is linked to a dimension value of a higher 
category, 

(iic) creating a new fused category in the dimension immediately below the 
parent category in case at least one of the dimension values of the child category is 
linked to more than one dimension value of the parent category. 
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(iid) ' for pach dimensional value of the child category, performing the steps of: 
creating a new dimension value of the new fused category representing the one or more 
values of the parent category to which the dimensional value of the child category is 
linked and creating links from said new dimension value to said values in the parent 
category, the creation of the new dimension value being conditioned that no dimension 
value of the new fused category already exists having exactly such link(s), and 

for each category being a direct predecessor of the parent category of which 
category at least one dimension value of the parent category is linked to a dimension 
value of, setting said category as a grandparent category and creating links from the 
new dimension value to the one or more dimension values of the grandparent category 
to which said one or more dimensional values of the parent category are linked, 

(lie) removing the links from the parent category to the one or more grandparent 
categories, whereby the grandparent categories no longer are direct predecessors of 
the parent category, 

(iif) creating links from each dimensional value of the child category to the 
dimension value of the new fused category having the same links to the dimension 
values of the parent category whereby the new fused category becomes a direct 
predecessor of the child category, and removing the links from the dimension values of 
the child category to the parent category, whereby the parent category no longer is a 
direct predecessor of the child category, 

and 

(iig) setting the new fused category as the child category and returning to step 

(ii). 
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7. A method according to claim 3, wherein the make-onto procedure 
comprises the steps of, starting from the top category and successively proceeding 
towards the bottom category, 

creating, for each dimension value of each category above the bottom category 
not being linked to any dimensional value of the category immediately below, a new 
dimension value in the category immediately below and creating a link between said 
new dimension value and said dimension value of the category in question. 

8. A method according to claim 3, wherein the make-onto procedure 
comprises the successive steps of 

(i) setting the top category of the dimension as the parent category, 

(ii) for each category immediately below the parent category and having dimension 
values being linked to dimension values of the parent category, setting said category as 
the child category and perform the steps of 

(iia) creating, for each dimension value of the parent category not being linked 
to any dimensional value of the child category, a new dimension value in the child 
category and creating a link between said new dimension value and said dimension 
value of the parent category, 

(iib) setting the child category as parent category, 

(iic) ending the make-onto procedure in case the parent category is the bottom 
category of the dimension, else returning to step (ii) of the make-onto procedure. 
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9. • A miethod according to claim 4. wherein the make-covering procedure 
comprises the successive steps of 

identifying links between dimension values of two categories having at least one 
intermediate category there between, 

creating a new dimension value in each of said intermediate categories for each 
of those links for which no paths of links exists going only through immediate child- 
parent links from lower to higher categories and including a link to a dimension value of 
the intermediate category, and 

replacing those links with links between the dimension values of those links and 
the new dimension values. 

10. A method according to claim 4, wherein the make-covering procedure 
comprises the successive steps of 

(i) setting the bottom category of the dimension as the child category, 

(ii) for each category immediately above the child category for which at least one link 
between a dimension value of said category and a dimension value of the child category 
exists, setting the category as the parent category and perform the steps of: 

(iia) ending the make-covering procedure for the parent category in case the 
parent category is the top category of the dimension:, 

(iib) for each higher category being a direct predecessor category of the child 
category and being higher in the hierarchy than the parent category, performing the 
steps of 
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' (iiba)' identifying sets of dimension values of the higher category and 
dimension values of the child category for which sets 
a link exists, and 

no paths of links going only from lower to higher categories and including 
a link to a dimension value of the parent category exists, and 

(iibb) creating for each identified set of dimension values a new dimension 
value in the parent category, creating links between each of the dimension 
values of the set and the new dimension value, and removing the link between 
the two dimension values of the identified set, whereby the higher category no 
longer is a predecessor of the child category, 

(iic) setting the parent category as the child category and returning to step (ii). 

11. A method for by means of a computer to at least partly aggregation 
normalise a multidimensional object including a set of facts comprising a plurality of 
facts mapped on a plurality of dimensions having dimension values organised into 
categories of dimension values based on a partial ordering, the multidimensional object 
comprising mappings of links between dimension values within each dimension, by 
means of applying the method of claim 1 to at least one of the dimensions of the 
multidimensional object. 

1 3. A method according to claim 1 1 , comprising the steps of 
selecting a subset of categories of the one or more dimension to be aggregation 
normalised, and 
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performing an aggregation normalisation of the selected subset, 
whereby one or more of the dimensions of the multidimensional object is/are only partly 
aggregation normalised. 

14. A method according to claim 1 1 , comprising the steps of 
selecting specific aggregation functions to be performed on the multidimensional 
object, and 

selecting by means of the computer normalisation steps to be performed based 
on the selection of specific aggregation functions to be performed, 
whereby one or more of the dimensions of the multidimensional object is/are only partly 
aggregation normalised. . 

17. A method according to claim 1 5, wherein the step of creating new 
dimensional values and modifying the mapping comprises the step of 

executing a make-covering procedure for making the multidimensional object 
covering, thereby at least partly making the non-covering multidimensional object 
aggregation normalised. 

18. A method according to claim 15, wherein the method comprises the initial 
step of making each of the plurality of dimensions aggregation normalised. 

19. A method according to claim 1 1 , wherein the created new dimensional 
values are marked as such, a pre-aggregation is performed on a multidimensional 
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object being' normalised by means oTf tine computer and the method further comprises 
the step of 

producing a reply to a query made to the system and concerning the 
multidimensional object, aggregate queries, exploring the dimension hierarchies, as well 
as navigation queries, that summarise the data at various levels of detail, in which reply 
the existence of the created new dimensional values is transparent. 

20. A method according to claim 1 1 , further comprising the steps of 
implementing, into the aggregation normalised multidimensional object, of new 
facts including mapping of the facts onto the dimension, of new dimension values of the 
dimensions, or of new mapping between some of the dimension values, by which 
implementation irregularities of the multidimensional object is introduced, 

analysing the introduced irregularities of the dimensions of the multidimensional 

object, 

creating new dimensional values of the multidimensional object and modifying 
the mapping between dimensional values of the multidimensional object according to 
the analysis, whereby the multidimensional object is aggregation normalised, and 

saving the new dimensions and the modified mapping in data storage means of 
the computer. 

21 . A computer system comprising at least one general purpose computer 
having data storage means associated therewith on which data storage means is stored 
a computer programme product suitable for adapting the computer to perform an at 



Serial No. Unknown - New PCT Application 



Page 9 



least partly Aggregation normalisation of a multidimensional object according to the 
method of claim 11, the computer system comprising means for retrieving the computer 
programme product and perform accordingly. 

22. A computer programme product suitable for adapting a general purpose 
computer to perform an at least partly aggregation normalisation of a multidimensional 
object according to the method of claim 1 1 . 

25. A computer system according to claim 23, wherein the query handler 
component is adapted for producing replies to aggregate queries in which replies the 
existence of the second plurality of dimensions is transparent 

28. A computer system according to claim 23 further comprising means 
adapted for performing an at least partly aggregation normalisation of a 
multidimensional object. 

REMARKS 

Claims 3-11, 13, 14, 17, 18-21 ,22, 25, and 28 were amended and 1,2, 12, 
15-16, 23-24, and 26-27 are now pending the application. Claims 1, 15, and 23 
currently remain independent in this application. The amendments to the claims 
contained herein are of equivalent scope as originally filed and, thus, are not a 
narrowing amendment. It is submitted that this Amendment places the application in 
suitable condition for allowance; notice of which is requested. If the Examiner believes 
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that personal communication will expedite prosecution of this application, the Examiner 
is invited to telephone the undersigned at (703) 390-3030. 

Respectfully submitted, 



Dated: 




Harness, Dickey & Pierce, P.L.C. 
P.O. Box 8910 
Reston, Virginia 20195 . 
(703) 390-3030 
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• ATTACHMENT FOR CLAIM AMENDMENTS 

The following is a marked up version of each amended claim in which 
underlines indicates insertions and brackets indicate deletions. 

3. (Amended) A method according to claim 1 [or 2], wherein the step of 
creating new dimensional values and modifying the mapping comprises the steps of 

examine whether the dimension is covering, and in case it is, 
executing a make-onto procedure for making the dimension onto, thereby at least 
partly making an non-onto (dimension aggregation normalised. 

4. (Amended) A method according to claim 1 [any of claims 1-3], wherein the 
step of creating new dimensional values and modifying the mapping comprises the step 
of 

executing a make-covering procedure for making the dimension covering, 
thereby at least partly making a non-covering dimension aggregation normalised. 

5. (Amended) A method according to claim 1 [any of claims 2-4], wherein the 
make-strict procedure comprises the steps of, starting from the bottom category and 
successively proceeding towards the top category, 

identifying combinations of dimensional values of the same category for each of 
which combination at least one dimension value of a category below said category is 
linked to each of the dimension values of the combination, 

creating one or more new dimensional values each representing one of the 
identified combinations of dimensional values and creating links from the new 
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dimensional Values to dimensional values of above categories in accordance with 
existing links from each of the dimensional values represented by the new dimensional 
value, and 

identifying dimension values being linked to identified combinations of 
dimensional values of the same category and replacing the links with links to new 
dimensional values representing said combinations of dimensional values. 

6. (Amended) A method according to claim 2 [any of claims 2-5], wherein the 
make-strict procedure comprises the successive steps of 

(i) setting the bottom category of the dimension as the child category, 

(ii) for each category being a direct predecessor of the child category of which 
category at least one dimension value of the child category is linked to a dimension 
value of, setting said category as the parent category and performing the steps of: 

(iia) ending the make-strict procedure for the parent category in case the 
parent category is the top category of the dimension, 

(iib) ending the make-strict procedure for the parent category in case no 
dimension value of the parent category is linked to a dimension value of a higher 
category, 

(iic) creating a new fused category in the dimension immediately below the 
parent category in case at least one of the dimension values of the child category is 
linked to more than one dimension value of the parent category, 

(iid) for each dimensional value of the child category, performing the steps of: 
creating a new dimension value of the new fused category representing the one or more 
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values of the parent category to which the dimensional value of the child category is 
linked and creating links from said new dimension value to said values in the parent 
category, the creation of the new dimension value being conditioned that no dimension 
value of the new fused category already exists having exactly such link(s), and 

for each category being a direct predecessor of the parent category of which 
category at least one dimension value of the parent category is linked to a dimension 
value of, setting said category as a grandparent category and creating links from the 
new dimension value to the one or more dimension values of the grandparent category 
to which said one or more dimensional values of the parent category are linked, 

(iie) removing the links from the parent category to the one or more grandparent 
categories, whereby the grandparent categories no longer are direct predecessors of 
the parent category, 

(iif) creating links from each dimensional value of the child category to the 
dimension value of the new fused category having the same links to the dimension 
values of the parent category whereby the new fused category becomes a direct 
predecessor of the child category, and removing the links from the dimension values of 
the child category to the parent category, whereby the parent category no longer is a 
direct predecessor of the child category, 

and 

(iig) setting the new fused category as the child category and returning to step 

(ii). 
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7. * (Amended) A method according to claim 3 [any of claims 3-6], wherein the 
make-onto procedure comprises the steps of, starting from the top category and 
successively proceeding towards the bottom category, 

creating, for each dimension value of each category above the bottom category 
not being linked to any dimensional value of the category immediately below, a new 
dimension value in the category immediately below and creating a link between said 
new dimension value and said dimension value of the category in question. 

8. (Amended) A method according to claim 3 [any of claims 3-7], wherein the 
make-onto procedure comprises the successive steps of 

(!) setting the top category of the dimension as the parent category, 
(ii) for each category immediately below the parent category and having dimension 
values being linked to dimension values of the parent category, setting said category as 
the child category and perform the steps of 

(iia) creating, for each dimension value of the parent category not being linked 
to any dimensional value of the child category, a new dimension value in the child 
category and creating a link between said new dimension value and said dimension 
value of the parent category, 

(lib) setting the child category as parent category, 

(iic) ending the make-onto procedure in case the parent category is the bottom 
category of the dimension, else returning to step (ii) of the make-onto procedure. 
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9. ' (Amended) A method according to claim 4 [any of claims 4-8], wherein the 
make-covering procedure comprises the successive steps of 

identifying links between dimension values of two categories having at least one 
intermediate category there between. 

creating a new dimension value in each of said intermediate categories for each 
of those links for which no paths of links exists going only through immediate child- 
parent links from lower to higher categories and including a link to a dimension value of 
the intermediate category, and 

replacing those links with links between the dimension values of those links and 
the new dimension values. 

1 0. (Amended) A method according to claim 4 [any of claims 4-9], wherein the 
make-covering procedure comprises the successive steps of 

(i) setting the bottom category of the dimension as the child category, 

(ii) for each category immediately above the child category for which at least one link 
between a dimension value of said category and a dimension value of the child category 
exists, setting the category as the parent category and perform the steps of: 

(iia) ending the make-covering procedure for the parent category in case the 
parent category is the top category of the dimension:, 

(iib) for each higher category being a direct predecessor category of the child 
category and being higher in the hierarchy than the parent category, performing the 
steps of 
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' (iiba) ' identifying sets of dimension values of the higher category and 
dimension values of the child category for which sets 
a link exists, and 

no paths of links going only from lower to higher categories and including 
a link to a dimension value of the parent category exists, and 

(iibb) creating for each identified set of dimension values a new dimension 
value in the parent category, creating links between each of the dimension 
values of the set and the new dimension value, and removing the link between 
the two dimension values of the identified set, whereby the higher category no 
longer is a predecessor of the child category, 

(iic) setting the parent category as the child category and returning to step (ii). 

1 1 . (Amended) A method for by means of a computer to at least partly 
aggregation normalise a multidimensional object including a set of facts comprising a 
plurality of facts mapped on a plurality of dimensions having dimension values 
organised into categories of dimension values based on a partial ordering, the 
multidimensional object comprising mappings of links between dimension values within 
each dimension, by means of applying the method of claim 1 [any of claims 1-10] to at 
least one of the dimensions of the multidimensional object. 

1 3. (Amended) A method according to claim 1 1 [or 1 2], comprising the steps 

of 
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selecting a subset of categories of the one or more dimension to be aggregation 
normalised, and 

performing an aggregation normalisation of the selected subset, 
whereby one or more of the dimensions of the multidimensional object is/are only partly 
aggregation normalised. 

14. (Amended) A method according to claim 1 1 [any of claims 11-13], 
comprising the steps of 

selecting specific aggregation functions to be performed on the multidimensional 

object, and 

selecting by means of the computer normalisation steps to be performed based 
on the selection of specific aggregation functions to be performed, 
whereby one or more of the dimensions of the multidimensional object is/are only partly 
aggregation normalised. . 

1 7. (Amended) A method according to claim 15 [or 1 6], wherein the step of 
creating new dimensional values and modifying the mapping comprises the step of 

executing a make-covering procedure for making the multidimensional object 
covering, thereby at least partly making the non-covering multidimensional object 
aggregation normalised.. 
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1 8. ' (Amended) A method according to claim 15 [any of claims 1 5-1 7], wherein 
the method comprises the initial step of making each of the plurality of dimensions 
aggregation normalized [by means of the method according to any of claims 1-10]. 

19. (Amended) A method according to claim 1 1 [any of claims 11-18], wherein 
the created new dimensional values are marked as such, a pre-aggregation is 
performed on a multidimensional object being normalised by means of the computer 
and the method further comprises the step of 

producing a reply to a query made to the system and concerning the 
multidimensional object, aggregate queries, exploring the dimension hierarchies, as well 
as navigation queries, that summarise the data at various levels of detail, in which reply 
the existence of the created new dimensional values is transparent. 

20. (Amended) A method according to claim 1 1 [any of claims 11-1 9], further 
comprising the steps of 

implementing, into the aggregation normalised multidimensional object, of new 
facts including mapping of the facts onto the dimension, of new dimension values of the 
dimensions, or of new mapping between some of the dimension values, by which 
implementation irregularities of the multidimensional object is introduced, 

analysing the introduced irregularities of the dimensions of the multidimensional 

object, 
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creating new dimensional values of the multidimensional object and modifying 
the mapping between dimensional values of the multidimensional object according to 
the analysis, whereby the multidimensional object is aggregation normalised, and 

saving the new dimensions and the modified mapping in data storage means of 
the computer. 

21 . (Amended) A computer system comprising at least one general purpose 
computer having data storage means associated therewith on which data storage 
means is stored a computer programme product suitable for adapting the computer to 
perform an at least partly aggregation normalisation of a multidimensional object 
according to the method of claim 1 1 [any of claims 11 -20], the computer system 
comprising means for retrieving the computer programme product and perform 
accordingly. 

22. (Amended) A computer programme product suitable for adapting a 
general purpose computer to perform an at least partly aggregation normalisation of a 
multidimensional object according to the method of claim 1 1 [any of claims 1 1-20]. 

25. (Amended) A computer system according to claim 23 [or 24], wherein the 
query handler component is adapted for producing replies to aggregate queries in which 
replies the existence of the second plurality of dimensions is transparent. 
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28. (Amended) A computer system according to claim 23 [any of claims 23- 
27] further comprising means adapted for performing an at least partly aggregation 
normalisation of a multidimensional object [according to the method of any of claims 1 1- 
20]. 
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METHOD AND SYSTEMS FOR MAKING OLAP HIERARCHIES SUMMARISABLE 

Field of the invention 

The present invention relates to a method, a computer system, and a computer 
5 programme product for a computer system for transforming general On-line Analytical 
Processing (OLAP) hierarchies into summarisable hierarchies whereby pre-aggregation is 
enabled. Thereby, fast query response times for aggregation queries without excessive 
storage use are made possible even when the hierarchies originally are inregular. 

Background of the invention 

On-line Analytical Processing (OLAP) systems, which aim to ease the process of 
extracting useful information from large amounts of detailed transactional data, have 
gained widespread acceptance in traditional business applications as well as in new 
applications such as health care. These systems generally offer a dimensional view of 
data, in which measured values, tenmed facts, are characterised by descriptive values, 
drawn from a number of dimensions; and the values of dimension are typically organised 
in a containment-type hierarchy. A prototypical query applies an aggregate function, such 
as average, to the facts characterised by specific values from the dimensions. 

20 Fast response times are required from these systems, even for queries that aggregate 
large amounts of data. The perhaps most central technique used for meeting this 
requirement is termed pre-aggregation, where the results of aggregate queries are pre- 
computed and stored, i.e., materialised, for later use during query processing. Pre- 
aggregation has attracted substantial attention in the research community, where it has 

25 been investigated how to optimally use pre-aggregated data for query optimisation [7.3] 
and how to maintain the pre-aggregated data when base data is updated [19,24]. Further, 
the latest versions of commercial RDBMS products offer query optimisation based on pre- 
computed aggregates and automatic maintenance of the stored aggregate when base 
data is updated [30]. 

30 

The fastest response times may be achieved when materialising aggregate results 
corresponding to all combinations of dimension values across all dimensions, termed full 
(or eager) pre-aggregation. However, the required storage space grows rapidly, to quickly 
become prohibitive, as the complexity of the application increases. This phenomenon is 
35 called dafa explosion [4, 21 . 27] and occurs because the number of possible aggregation 
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combinations grows rapidly when the number of dimension » 
sparseness of the multidimensional space decreases in hig^r 
that aggregates at higher levels take up nearly as much 
In some commercial applications, full pre-aggregation take^ 
5 space ofthe raw data [21). Another problem with full 

long to update the materialised aggregates when base datii 



spsce 



pre-aggregati 



With the goal of avoiding data explosion, research has foased on how to select the best 



subset of aggregation levels given space constraints [1, 9. 
10 .time constraints 110], or the best combination of aggregate 

approach Is commonly referred to as pmtdcal (or partial or semi-eager [5. 11. 29]) pre- 
aggregation. Commercial OLAP systems now also exist XY at employ practical pre* 
aggregation, e.g., Microsoft Decision Support Services (Pis ito) (18] and Informix 
MetaCube [13]. 
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increase, while the 
dimension levels, meaning 
as lovkrer-level aggregates, 
up as much as 200 times the 
ion is that it takes too 
changes. 



11. 26, 28. 32] or maintenance 
data and indices [8]. This 



15 



The premise underlying the appficability of practical pre-ag jregation is that lower-level 
aggregates can t>e m-i/secf to compute tiigher-ievel aggregates, known as summarisability 
[16]. Summarisability occurs when the mappings in the dir leiwion hierarchies are onto 
(all paths in the hierarchy have equal lengths), covering (only immediate parent and child 
20 values can be related), and strict (each chihJ in a luerarchy has only one parent): and 
when also the relationships between facts and dimensions are many-to-one and facts are 



However, the data 
with this rigid regime. This 



always mapped to the lowest levels in the dimensions [16] 
encountered in many real-world applications fail to comply i 
motivates the search for techniques that allow practical pre -aggregation to be used for a 
25 wkJerrangeof applications, the focus of the present invent ion. 



Description of the Invention 

hAotivated by the increasing use of OLAP systems in many 
in business and health care, the present invention provides 
30 multidimensional databases that leverage the existing, 

known as practical, or partial or semi-eager, pre-aggregatijin, 
relevant to a much virider range of real-world applications. 



different applications, includirig 
transformation techniques for 
performance-enhancing technique. 
I.- by making this techruque 



Current pre-aggregation techniques assume that the dime isional structures are 
35 sununarisablB. Specifically, the mappings in dimension hie|rarchies must be onfo. 
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A new database operator that generalises aggregatiorjs 
is disclosed by Jim Gray et al. "Data Cube: A 
Generalizing group-By, Cross'Tab and Suiy-Toials' 
Discovery 1, 1997, and solutions are proposed on hovi 
5 execution and SQL-language level; It is mentioned tha ; 
renders the pre-aggregation impossible but no solutior 



for the N-dimensional data space 
ReiationkI Aggregation Operator 
Mining and Knowledge 
to integrate this operator on the 
irregular dimension hierarchies 
to this prot>lem is provided. 
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covering, and strict the relationships between facts and dimensions must be many-to- 
one, and the facts must always be mapped to the lowest categories in dimensions. 

The present invention presents novel transformation techniques that render dimensions 
5 with hierarchies that are non-onto, non-covering, and non-strict summarisable. The 
transformations have practically low computational complexity, they may be implemented 
using standard relational database technology, and it is also disclosed how to integrate 
the transformed hierarchies in current OLAP systems, transparently to the user. 

10 The present description also disclose how to apply the transformations according to the 
invention to the cases of non-summarisable relationships between facts and dimensions, 
which also occur often in real-wortd applications. Finally, it is shown how to modify the 
algorithms to incrementally maintain the transfonmed hierarchies when the underiying data 
is modified. To our knowledge, this work is the first to present algorithms to automatically 

15 achieve summarisabillty for non-covering and non-onto hierarchies. The research 
reported here is also the first to demonstrate techniques and algorithms for achieving 
summarisabillty in non-strict hierarchies. The integration of the techniques into current 
systems, transparently to the user, we believe is a novel feature. 

20 The multidimensional databases or objects are, as the term indicates, comprised of a 
plurality of dimensions each being a hierarchy and the basic of the invention is to 
transform dimensions being irregular, that is being non-covering and/or non-onto and/or 
non-strict, into dimensions that are at least partly aggregation normalised, for most 
practical cases preferably fully aggregation normalised, to enable practical pre- 

25 aggregation of the dimensions. Thus, the present invention relates to a method for by 
means of a computer to at least partly aggregation normalise a dimension having 
dimension values organised into categories of dimension values based on a partial 
ordering, the dimension comprising mappings of links between dimension values, the 
method comprising the steps of 

30 retrieve the mapping from data storage means associated with the computer, 

analysing the mapping to determine irregularities of the dimension by means of 
analysing means executed by the computer, 

creating new dimension values of the dimension and modifying the mapping 
between dimensional values of the dimension according to the analysis, whereby the 

35 dimension is at least partly aggregation normalised, and 
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saving the new dimension values and the modified mappings in data storage 
means of the computer. 

Most important for a preferred embodiment of the present invention is to transform the 
5 dimension to a aggregation strict dimension, thus the step of creating new dimensional 
values and modifying the mapping preferably comprises the step of 

executing a make-strict procedure for making the dimension aggregation strict, 
thereby making the non-strict dimension aggregation normalised, the make-strict 
procedure bemg executed on the condition that the dimension is covering as well as onto 
1 0 prior to the execution. 

The step of creating new dimensional values and modifying the mapping may further 
comprise the step of 

executing a make-onto procedure for making the dimension onto, thereby at least 
15 partly making an into dimension aggregation normalised, the make-onto procedure being 
executed on the condition that the dimension is covering prior to the execution. This step 
is included and executed if the method should be able to handle dimensions being non- 
onto. 

20 Additionally or altematively, the step of creating new dimensional values and modifying 
the mapping may comprise the step of 

executing a make-covering procedure for making the dimension covering, thereby 
at least partly making a non-covering dimension aggregation normalised. This step is 
included and executed if the method should be able to handle dimensions being non- 
25 covering. 

The make-strict procedure comprises in a preferred embodiment of the invention the 
steps of, starting from the bottom category and successively proceeding towards the top 
category, 

30 identifying combinations of dimensional values of the same category for each of 

which combination at least one dimension value of a category below said category is 
linked to each of the dimension values of the combination, 

creating one or more new dimensional values each representing one of the 
identified combinations of dimensional values and creating links from the new dimensional 
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values to dimensionat values of above categories in accordance with existing links from 
each of the dimensional values represented by the new dimensional value, and 

identifying dimension values being linked to identified combinations of dimensional 
values of the same category and replacing the links with links to new dimensional values 
5 representing said combinations of dimensional values. 

According to an even more preferred embodiment, the make-strict procedure comprises 
the successive steps of 

(s) setting the bottom category of the dimension as the child category, 
1 0 (ii) for each category being a direct predecessor of the child category of which 

category at least one dimension value of the child category is linked to a dimension value 

of, setting said category as the parent category and performing the steps of: 

(iia) ending the make-strict procedure for the parent category in case the parent 

category is the top category of the dimension, 
15 (lib) ending the make-strict procedure for the parent category in case no 

dimension value of the parent category is linked to a dimension value of a higher 

category. 

(iic) creating a new fused category in the dimension immediately below the 
parent category in case at least one of the dimension values of the child category is linked 

20 to more than one dimension value of the parent category, 

(iid) for each dimensional value of the child category, performing the steps of: 
creating a new dimension value of the new fused category representing the one or more 
values of the parent category to which the dimensional value of the child category is 
linked and creating links from said new dimension value to said values in the parent 

25 category, the creation of the new dimension value being conditioned that no dimension 
value of the new fused category already exists having exactly such link(s), and 

for each category being a direct predecessor of the parent category of which 
category at least one dimension value of the parent category is linked to a dimension 
value of, setting said category as a grandparent category and creating links from the new 

30 dimension value to the one or more dimension values of the grandparent category to 
which said one or more dimensional values of the parent category are linked, 

(iie) removing the links from the parent category to the one or more grandparent 
categories, whereby the grandparent categories no longer are direct predecessors of the 
parent category. 
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(iif) creating links from each dimensional value of the child category to the 
dimension value of the new fused category having the same links to the dimension values 
of the parent category whereby the new fused category becomes a direct predecessor of 
the child category, and removing the links from the dimension values of the child category 

5 to the parent category, whereby the parent category no longer is a direct predecessor of 
the child category, 
and 

(iig) setting the new fused category as the child category and returning to step 
(ii). This procedure is described as a recursive process but it is within the scope of the 

10 present invention to include a similar iterative procedure which would be obvious to the 
person skilled in the art. 

The make-onto procedure comprises according to a preferred embodiment of the 
invention the steps of, starting from the to category and successively proceeding towards 
15 the bottom category, 

creating, for each dimension value of each category above the bottom category 
not being linked to any dimensional value of the category immediately below, a new 
dimension value in the category immediately below and creating a link between said new 
dimension value and said dimension value of the category in question. 

20 

According to an even more preferred embodiment, the make-onto procedure comprises 
the successive steps of 

(i) setting the top category of the^dimension as the parent category, 

(ii) for each category immediately below the parent category and having dimension 
25 values being linked to dimension values of the parent category, setting said category as 

the child category and perform the steps of 

(iia) creating, for each dimension value of the parent category not being linked 
to any dimensional value of the child category, a new dimension value in the child 
category and creating a link between said new dimension value and said dimension value 
30 of the parent category, 

(lib) setting the child category as parent category, 

(isc) ending the make*onto procedure in case the parent category is the bottom 
category of the dimension, else returning to step (ti) of the make-onto procedure. As for 
the make-strict procedure, this procedure is described as a recursive process but it is 
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Within the scope of the present invention to include a similar iterative procedure which 
would be obvious to the person skilled in the art. 

The make-covering procedure comprises according to a preferred embodiment of the 
5 present invention the successive steps of 

identifying links between dimension values of two categories having at least one 
intermediate category there between, 

creating a new dimension value in each of said intermediate categories for each of 
those links for which no paths of links exists going only through immediate child-parent 
10 links from lower to higher categories and including a link to a dimension value of the 
intermediate category, and 

replacing those links with links between the dimension values of those links and 
the new dimension values. 

15 According to an even more preferred embodiment, the make-covering procedure 
comprises the successive steps of 

(i) setting the bottom category of the dimension as the child category, 

(ii) for each category immediately above the child category for which at least one link 
between a dimension value of said category and a dimension value of the child category 

20 exists, setting the category as the parent category and perform the steps of: 

(iia) ending the make-covering procedure for the parent category in case the 
parent category is the top category of the dimension:, 

(iib) for each higher category being a direct predecessor category of the child 
category and being higher in the hierarchy than the parent category, performing the steps 

25 of 

(iiba) identifying sets of dimension values of the higher category and 
dimension values of the child category for which sets 

a link exists, and 

no paths of links going only from lower to higher categories and including a 
30 link to a dimension value of the parent category exists, and 

(iibb) creating for each identified set of dimension values a new dimension 
value in the parent category, creating links between each of the dimension values 
of the set and the new dimension value, and removing the link t>etween the two 
dimension values of the identified set, whereby the higher category no longer is a 

35 predecessor of the child category. 
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(lie) setting the parent category as the child category and returning to step (ii). 
As with the above described procedures, this procedure is also described as a recursive 
process but it is within the scope of the present invention to include a similar iterative 
procedure which would be obvious to the person skilled in the art. 

5 

The present invention further relates to a method for by means of a computer to at least 
partly aggregation normalise a multidimensional object including a set of facts comprising 
a plurality of foots mapped on a plurality of dimensions having dimension values 
organised into categories of dimension values based on a partial ordering, the 

10 multidimensional object comprising mappings of links between dimension values within 
each dimension, by means of applying the method according to previous description 
including the variations thereof to at least one of the dimensions of the multidimensional 
object. In order to handle multidimensional object comprising a plurality of facts and in 
which the mapping comprises links from each of the facts to at least one dimension value 

15 in each of the plurality of dimensions, the facts may be organised in the hierarchies of the 
dimensions so that they constitute the bottom layer of each of the dimensions of the 
multidimensional object where after the method is applied to the object. 

According to a variation of the present invention which for may practical situations of pre- 
20 aggregation is preferred, one or more of the dimensions of the multidimensional object is 
only partly aggregation normalised, the performance of the partly aggregation 
normalisation being based on a selection of a subset of categories of the one or more 
dimension to be aggregation normalised. In particular, one or more of the dimensions of 
the multidimensional object may be only partly aggregation normalised in that the 
25 normalisation steps to be performed are selected by means of the computer based on a 
selection of specific aggregation functions to be performed on the multidimensional object. 

The present Invention relates in a yet further aspect and as an alternative to the above 
disclosed method of including the facts into the multidimensional object and perform the 

30 aggregation normalisation on the whole object, to a method for by means of a computer to 
at least partly aggregation normalise a multidimensional object including a set of facts 
comprising a plurality of facts mapped on an aggregation normalised plurality of 
dimensions having dimension values being organised into categories of dimension values 
based on a partial ordering, the multidimensional object comprising mappings of links 

35 between dimension values within each dimension. 
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the method comprising the steps of 

retrieve the mapping from data storage means associated mth the computer, 
including the mapping of the plurality of facts on the muttidimenstonal object into 

the mapping of the multidimensional object so that the mapping comprises links from each 
5 of the facts to at least one dimension value in each of the plurality of dimensions, and the 

facts constitutes the bottom layer of each of the dimensions of the multidimensional 

object, 

analysing the mapping of the multidimensional object to determine irregularities of 
the dimensions by means of analysing means executed by the computer, 
10 creating new dimension values of the multidimensional object and modifying the 

mapping between dimensional values of the multidimensional object according to the 
analysis, whereby the multidimensional object is at least partly aggregation normalised, 
and 

saving the new dimensions and the modified mapping in data storage means of 
15 the computer. 

In a preferred embodiment of the yet further aspect of the present invention, the step of 
creating new dimensional values and modifying the mapping comprises the step of 
executing a make-strict procedure for making the multidimensional object 
20 aggregation strict, thereby making the non-strict multidimensional object aggregation 
normalised, the make-strict procedure being executed on the condition that the 
multidimensional object is covering prior to the execution. 

Additionally or alternatively, the step of creating new dimensional values and modifying 
25 the mapping comprises the step of 

executing a make-covering procedure for making the multidimensional object 
covering, thereby at least partly making the non-covering multidimensional object 
aggregation normalised. 

30 The make-strict and make-covering procedures may within the scope of the present 
invention be constructed similariy to the above-described procedures with the necessary 
modifications which for the skilled person would be straight forward. 
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The method according to the yet further aspect may further comprise the initial step of 
making each of the plurality of dimensions aggregation normalised by means of tiie 
method described previously. 

5 The created new dimensional values may be marked as such, e.g. be stored in a parallel 
new dimension or in another manner be recognisable by the computer as being new 
values, likewise may the changed mapping be made recognisable, where after a pre- 
aggregation is performed on a multidimensional object being normalised by means of the 
computer according to the method and the method may further comprise the step of 

10 producing a reply to a query made to the system and concerning the 

multidimensional object, aggregate queries as well as navigation queries, in which reply 
the existence of the created new dimensional values is transparent. Thereby, the method 
of the present invention may be added to existing methods of replying to queries to 
multidimensional objects thus making the existing methods capable of handling irregular 

15 multidimensional objects without changing the users access to make queries or the 
replies to the queries. Thus, it is hidden or transparent to the user or the computer 
programme making the queries that the original multidimensional object is irregular and 
not aggregation normal. 

20 Multidimensional objects are often in practical use updated with new facts and it is 
preferred that the method is able to handle such updates, which may be very frequent, 
without the need to perform the above method on the entire original multidimensional 
object, which may be a very timely procedure. Thus, the method may further comprise the 
steps of 

25 implementing, into the aggregation normalised multidimensional object, of new 

facts including mapping of the facts onto the dimension, of new dimension values of the 
dimensions, or of new mapping between some of the dimension values, by which 
implementation irregularities of the multidimensional object is introduced. 

analysing the introduced irregularities of the dimensions of the multidimensional 

30 object, 

creating new dimensional values of the multidimensional object and modifying the 
mapping between dimensional values of the multidimensional object according to the 
analysis, whereby the multidimensional object is aggregation normalised, and 

saving the new dimensions and the modified mapping in data storage means of 
35 the computer. 



SUBSTITUTE SHEET (RULE 26) 



Tj..;i ,:':t. ..H.. "-J- n.. ...u., 15 .i: "li^ -j' i\ ? 
WO 01/08041 1 1 PCT/DKOO/00354 

The present invention relates further to a computer system comprising at least one 
general purpose computer having data storage means associated therewith on which data 
storage means is stored a computer programme product suitable for adapting the 
5 computer to perform an at least partly aggregation normalisation of a multidimensional 
object according to the above-described method(s), the computer system comprising 
means for retrieving the computer programme product and perform accordingly. 

The present invention relates yet further to a computer programme product suitable for 
10 adapting a general purpose computer to perform an at least partly aggregation 

normalisation of a multidimensional object according to the above-described method(s). 

A still further aspect of the present invention relates to a computer system having data 
storage means associated therewith on which a multidimensional object is stored, the 
15 multidimensional object including 

a set of facts comprising a plurality of facts, 

a first plurality of dimensions having dimension values being organised into 
categories of dimension values based on a partial ordering and comprising a first mapping 
of links between dimension values within each dimension of the first plurality of 
20 dimensions as well as links between the facts and the dimensions of the first plurality of 
dimensions, at least one of the dimensions of the first plurality of dimensions being 
irregular, and 

a second plurality of dimensions having dimension values being organised into 
categories of dimension values based on a partial ordering and comprising a second 
25 mapping of links between dimension values within each dimension of the second plurality 
of dimensions as well as links between the facts and the dimensions of the second 
plurality of dimensions, each of the second plurality of dimensions being aggregation 
normalised, 

the computer system comprising a query handler component being adapted for 
30 producing replies to queries made to the computer system and concerning the 

multidimensional object, the replies to navigation queries being based on the first set of 
dimensions and the replies to aggregate queries being based on the second set of 
dimensions. Thereby, the transparency to the user and the integratibility with known 
systems is in general achieved. Furthermore, a set of pre-aggregation data relating to the 
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second plurality of dimensions may be stored within the data storage means and the 
replies to aggregate queries furthermore are based on the set of pre-aggregation data. 

The query handler component may according to the computer system of the still further 
5 aspect be adapted for producing replies to aggregate queries in which replies the 
existence of the second plurality of dimensions is transparent. The query handler 
component may furthermore be adapted for transforming aggregate queries made to the 
first plurality of dimensions into queries for the second set of dimensions and transforming 
replies based on the second set of dimensions into replies as based on the first set of 

10 dimensions, thus making the existence of the second plurality of dimensions transparent 
in the produced reply. Additionally, the multidimensional object may be stored within the 
data storage means of the computer system in tables organised as a combination of star 
schemes for the part of the multidimensional object containing only strict mappings, and 
additional tables containing the non-strict part of the mappings, the query handler 

15 component makes use of said tables in transforming queries and replies. 

The computer system according to the still further aspect of the present invention may 
suitably be combined with the above-described method(s) and comprise means adapted 
for performing an at least partly aggregation normalisation of a multidimensional object 
20 according to the method(s). 

Description of the Preferred Embodiment 

The invention will be described primarily as a method for making non-summarisable 
hierarchies in multi-dimensional databases summarisable. However, a person skilled in 
25 the art will recognise that an apparatus, such as a data processing system, including a 
CPU. memory. I/O, program storage, a connecting bus, and other appropriate 
components, could be programmed or otherwise designed to facilitate the practice of the 
method of the invention. Such a system would include appropriate program means for 
executing the method of the invention. 

30 

Also, an article of manufacture, such as a pre-recorded disk or other similar computer 
program products, for use with a data processing system, could include a storage medium 
and program means recorded thereon for directing the data processing system to facilitate 
the practice of the method of the invention. Such apparatus and articles of manufacture 
35 also fall within the spirit of the invention. 
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Detailed descnfytion of the Invention 
Wte now proceed to describe the invention in detail. The 
clinical case shidy that exemplifies the non-summarisabte 

5 appDcations. "me following section proceeds to define the 
data model necessary for describing the new techniques, 
properties related to summarisabitity. Algorithms are 
dimension hierarchies to achieve summarisabflily. then 
summarisable relaltonships behween facts and dimensions 

10 the techniques may be integrated into current systems 
to modify the algorithms to accommodata incremental 
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Brief description of figures 
The detailed description of the invention is accompanied by a set of figures of which 



Rg. 1 is an ER diagram illustrating the underlying data 



of the case study. 



Fig. 2 illustrates the dimension types of the case study, 



Fig. 3 illustrates on the left the transfbnnation performejd 
Covering algorithm, and on the right the 
10 Make Onto algorithm, 



on the hierarchy by the Make 
transformationi performed on the hierarchy by the 



Fig. 4 illustrates the transformation performed on the h^rarchy by the Make Strict 
algorithm, 



1 5 Fig.5 illustrates on the left another example of the 
Onto algorithm and on the right the transformation 
executed thereafter, 



trandformatton performed by the Make 
performed by the Make Strict algorithm 



Fig. 6 shows on the left the architecture of a standard C^LAP system and on the right the 
20 architecture of the present invention, and 

Fig. 7 shows the implementation of the system architedture. 
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Motivation — Case Study 

This section presents a case study that illustrates the properties of real-world dimension hier- 
archies. The case study concerns patients in a hospital, their associated diagnoses, and their 
places of residence. The data analysis goal is to investigate whether some diagnoses occur 
5 more often in some areas than in others, in which case environmental or lifestyle factors might 
be contributing to the disease pattern. An ER diagram illustrating the underlying data is seen 
in Figure 1 . 

The most important entities are the patients, for which we record the name. We always 
want to count the number of patients, grouped by some properties of the patients. Thus, in 
1 0 multidimensional terms, the patients are the facte, and the other, describing, entities constitute 
the dimensions. 

Each patient has a number of diagnoses, leading to a many-to-many relationship between 
facts and the diagnosis dimension. When registering diagnoses of patients, physicians use 
different levels of granularity, ranging from very precise diagnoses, e.g., "Insulin dependent 

1 5 diabetes during pregnancy," to more imprecise diagnoses, e.g., "Diabetes," which cover wider 
ranges of patient conditions. To model this, the relationship from patient to diagnoses is to 
the supertype "Diagnosis " which then has three subtypes, corresponding to different levels of 
granularity, the low-level diagnosis, the diagnosis family, and the diagnosis group. Examples 
of these are ''Insulin dependent diabetes during pregnancy," "insulin dependent diabetes," and 

20 "Diabetes," respectively. The higher-level diagnoses are both (imprecise) diagnoses in their 
own right, but also serve as groups of lower-level diagnoses. 

Each diagnosis has an alphanumeric code and a descriptive text, which are specified by 
some standard, here the World Health Organisation's International Classification of Diseases 
(ICD-10) [31], or by the physicians themselves. Indeed, two hierarchies are captured: the 

25 standard hierarchy specified by the WHO, and the user-defined hierarchy, which is used for 
grouping diagnoses on an ad-hoc basis in other ways than given by the standard. The Type 
attribute on the relationships determines whether the relation between two entities is part of 
the standard or the user-defined hierarchy. 

The hierarchy groups low-level diagnoses into diagnosis families, each of which consists 

30 of 2-20 related diagnoses. For example, the diagnosis "Insulin dependent diabetes during 
pregnancy" (the reason for having a separate pregnancy related diagnosis is that diabetes 
must be monitored and controlled particularly intensely during a pregnancy to assure good 
health of both mother and child) is part of the family "Diabetes during pregnancy." In the WHO 
hierarchy, a low-level diagnosis belongs to exactly one diagnosis family, whereas the user- 

35 defined hierarchy does not have this restriction. Thus, a low-level diagnosis can belong to 
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several diagnosis temilies, e.g., tlie "insulin dependent diai^etes during pregnancy" diagnosis 
belongs to both the "Diabetes during pregnancy" and the "Insulin dependent diabetes" family. 
Next, diagnosis families are grouped into diagnosis groups, consisting of 2-10 families, and 
one family may be part of several groups. For example, the family "Diabetes during pregnancy" 

5 may the part of the "Diabetes" and "Other pregnancy related diseases" groups. 

In the WHO hierarchy, a family belongs to exactly one group. In the WHO hierarchy, a 
lower-level value belongs to exactly one higher-level value, making it strict and covering. In 
the user-defined hierarchy, a lower-level value may belong to zero or more higher-level values, 
making it non-strict and non-covering. Properties of the hierarchies will be discussed in more 

1 0 detail In the section "Hierarchy Properties." 

We also record the addresses of the patients. If the address is located in a city, we record 
the city; otherwise, if the address is in a rural area, we record the county in which the address 
is located. A city is located in exactly one county. As not all addresses are in cities, we cannot 
find all addresses in a county by going through the "City located in" relationship. Thus, the 

15 mapping from addresses to cities is non-covering w.r.t. addresses. For cities and counties, 
we just record the name. Not all counties have cities in them, so the mapping from cities to 
counties is into rather than onto. 

In order to exemplify the data, we assume a standard mapping of the ER diagram to 
relational tables, i.e., one table per entity and relationship type. We also assume the use of 

20 surrogate keys, named ID, with globally unique values. The three subtypes of the Diagnosis 
type are mapped to a common Diagnosis table, and because of this, the "belongs to" and 
"grouping" relationships are mapped to a common "Grouping" table. The resulting tables with 
sample data are shown in Table 1 and will be used in examples throughout the paper. 

If we apply pre-aggregation to the data from the case study, several problems occur. For 

25 example, if the counts of patients by City are pre-computed and we use these for computing 
the numbers of patients by county, an incorrect result will occur. In the data, the addresses 
"123 Rural Road" and "1 Sandy Dunes" (one of them is the address of a patient) are not in 
any city, making the mapping from City to County not covering w.r.t. addresses. 

Next, if the counts of patients by Low-Level Diagnosis are pre-computed and we use these 

30 for computing the total count of patients, an incorrect result again ensues. First, patients only 
with lung cancer are not counted, as lung cancer is not present at the level of Low-Level 
Diagnosis; the mapping from Low-Level Diagnosis to Diagnosis Family is into. Second, pa- 
tients such as "Jim Doe" only have higher-level diagnoses and will no be counted; the fact-to- 
dimension mapping has varying granularity. Third, patients such as "Jane Doe" have several 

35 diagnoses and will be counted several times; the relationship between facts and dimensions 
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ID 


Name 


1 


John Doe 


2 


Jane Doe 


3 


Jim Doe 


Patient 



PatientID 


AddressID 


1 


50 


2 


51 


3 


52 



LivesAt 



ID 


Address 


50 


21 Central Street 


51 


34 Main Street 


52 


123 Rural Road 


53 


1 Sandy Dunes 



ID 


Name 


20 


Sydney 


21 


Melbourne 



City 



Address 



PatientID 


DiagnosislD 


Type 


1 


9 


Primary 


2 


5 


Secondary 


2 


9 


Primary 


3 


11 


Primary 



Has 



ParentlD 


ChildID 


Type 


4 


5 


WHO 


4 


6 


WHO 


9 


5 


User<leflned 


10 


6 


User-defined 


11 


9 


WHO 


11 


10 


WHO 


12 


4 


WHO 


13 


14 


WHO 



Grouping 



ID 


Code 


Text 


Type 


4 


024 


Diabetes during pregnancy 


Family 


5 


O24.0 


Insulin dependent diabetes during pregnancy 


Low-Level 


6 


024.1 


Non insulin dependent diabetes during pregnancy 


Low-Level 


9 


E10 


Insulin dependent diabetes 


Family 


10 


E11 


Non insulin dependent diabetes 


Family 


11 


E1 


Diabetes 


Group 


12 


02 


Other pregnancy related diseases 


Group 


13 


A1 


Cancer 


Group 


14 


All 


Lung cancer 


Family 



Diagnosis 



AddressID 


CitylD 


50 


20 


51 


21 



LocatedlnCity 



ID 


Name 


30 


Sydney 


31 


Melbourne 


32 


Outback 



County 



ID 


Name 




CitylD 


CountylD 


52 


31 




20 


30 


53 


32 




21 


31 



Locatedl nRuralArea 



CityLocatedin 



Table 1 : Tables for the Case Study 

is many-tomany. Fourth, Low-Level diagnoses such as "Insulin dependent diabetes during 
pregnancy" are part of several diagnosis families, which may also lead to "double" counting 
when computing higher-level counts; the dimension hierarchy Is non-strict. 

These problems yield "non-summarlsable" dimension hierarchies that severely limit the 
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applicability of practical pre-aggregation, leaving only full pre-aggregation, requtrering huge 
amounts of storage, or no pre-aggregation, resulting in long response time for queries. 

The properties described above are found in many other real-world applications. Many- 
to-many relationships between facts and dimensions occur between bank customers and ac- 

5 counts, between companies and Standard Industry Classifications (SICs), and between stu- 
dents and departments [15, 16]. Non-strict dimension hierarchies occur from cities to states 
in a Geography dimension [25] and from weeks to months in a Time dimension. In addition, 
hierarchies where the change over time is captured are generally non-strict. The mapping 
from holidays to weeks as well as organisation hierarchies of varying depth [12] offer ex- 

10 amples of "into" mappings. Non-covering relationships exist for days-holidays-weeks and for 
counties-cities-states, as well as in organisation hierarchies [12]. 

Even though many real-world cases possess the properties described above, current tech- 
niques for practical pre-aggregation require that facts are in a many-to-one relationships to 
dimensions and that all hierarchies are strict, onto, and covering. Thus, current techniques 

1 5 cannot be applied when the hierarchies has these properties. 

Method Context 

This section defines the aspects of a multidimensional data model that are necessary to define 
the techniques that enable practical pre-aggregation in applications as the one just described. 

20 The full model is described elsewhere [22]. Next, the data model context is exploited for 
defining properties of hierarchies relevant to the techniques. 

The particular data model has been chosen over other multidimensional data models be- 
cause it quite naturally captures the data described in the case study and because it includes 
explicit concepts of dimensions and dimension hierarchies, which is very important for clearly 

25 presenting the techniques. However, the techniques are also applicable to other multidimen- 
sional or statistical data models, as will be discussed in the section "Architectural Conext." 

A Concrete Data Model Context 

For each part of the model, we detine the intension and the extension, and we give an illus- 
30 trating example. 

An n-dimensional fact schema is a two-tuple S = (J^,2>), where Is a fact type and 
P = {7i,i = 1, -,n} is its corresponding dimension types. 



35 



Example 1 In the case study, Patient \s the fact type, and Diagnosis, Residence, and Name 
are the dimension types. The intuition is that everyttiing that characterises the fact type is 
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considered to be dimensional 

A dimension type T is a four-tuple (C,<r,Tr,-Lr), wliere C = {Cj,j = are the 

category types of T, <r is a partial order on the Cj 's, with Tr € C and ±r ^ ^ being the top 

5 and bottom element of the ordering, respectively. Thus, the category types form a lattice. The 
intuition is that one category type is "greater than" another category type if members of the 
former's extension logically contain members of the latter's extension, i.e., they have a larger 
value size. The top element of the ordering corresponds to the largest possible value size, 
that is, there is only one value in ifs extension, logically containing all other values. 

1 0 We say that is a category type of T, written C, e T, if Cj gC. 

Example 2 Low-level diagnoses are contained in diagnosis families, which are contained in 
diagnosis groups. Thus, the Diagnosis dimension type has the following order on its category 
types: ±Dia9nosis ^ Low-level Diagnosis < Diagnosis Family < Diagnosis Group < T Diagnosis' 
15 Other examples of category types are Address, City, and County, Figure 2, to be discussed 
in detail later, illustrates the dimension types of the case study. 

A category Cj of type Cj is a set of dimension values e. A dimension D of type T = 
({Cj}, <r, Tr, ±r) 's a two-tuple D = (C, <), where C = {C,} is a set of categories Cj such 

20 that Type(Cj) = Cj and < is a partial order on UjCj, the union of all dimension values in 
the individual categories. We assume a function Pred : C »-f 2^ that gives the set of direct 
predecessors of a category Cj. Similarly, we a assume a function Desc : C h-^ 2^ that gives 
the set of direct descendants of a category Cj. For both Pred and Desc, we "count" from 
the category Tr (of type Tr), so that category Tr has no predecessors and category ±j- 

25 (of type J-r) has no descendants. Similariy, we assume a function ImmPred : C i-> 2^ that 
gives only the set of immediate predecessors of a category Cj. Immediate predecessors are 
immediately above in tiie hierarchy, i.e.. City is an immediate predecessor of Address, but 
County is not. Similarly, we a assume a function ImmDesc i C ^ 2^ that gives the set of 
immediate descendants of a category Cj. 

30 The definition of the partial order is: given two values ei,e2 then ei < if ei is logically 
contained in e2. We say that Cj is a category of £>, written Cj e D, if Cj € C For a dimension 
value e, we say tiiat e is a dimensional value of Z>, written e e £>, if e € u^Cj. 

The category of type J-r in dimension of type T contains the values with the smallest 
value size. The category with the largest value size, with type Tr, contains exactly one value, 
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denoted T. For all values e of the dimension Z>, e < T. Value T is similar to the ALL construct 
of Gray et ai. [6]. When the context is dear, we refer to a category of type Tr as a T category, 
not to be confused with the T dimension value. 

5 Example 3 in our Diagnosis dimension we have the following categories, named by their 
type. The numbers in parentheses are the ID values from the Diagnosis table in Table 1. 
Low-level Diagnosis ^ {"Insulin dependent diabetes during pregnancy" (5), "Non insulin de- 
pendent diabetes during pregnancy" (6)}, Diagnosis Family = {"Diabetes during pregnane/' 
(4), "Insulin dependent diabetes" (9), "Non insulin dependent diabetes" (10), "Lung cancer 

10 (14)}, Diagnosis Group = {"Diabetes" (11), "Other pregnancy related diseases" (12), "Can- 
cer (13)}, and Tr>ia9nosi8 = {T}. We have that Pred (Low-level Diagnosis) = {Diagnosis 
Family}, The partial order < is obtained by combining WHO and user-defined hierarchies, 
as given by the Grouping table in Table 1. Additionally, the top value T is greater than, i.e., 
logically contains, all the other diagnosis values. 

15 

Let F be a set of facts, and D = (C ^ {C,}, <) a dimension. A fact-dimension relation 
between F and I? is a set iZ = {(/,e)}, where f ^ F and e € UjC,. Thus R links facts to 
dimension values. We say that fact / is ctiaracterised by dimension value e, written / ^ e, if 
3ei E D ((/, ei) € i? A ei < e). We require that V/ € F (3e e UjCj ((/, e) € R)); thus, all fact 
20 maps to at least one dimension value in every dimension. The T value is used to represent 
an unknown or missing value, as T logically contains all dimension values, and so a fact / is 
mapped to T If it cannot be characterised within the particular dimension. 



Example 4 The fact-dimension relation R links patient facts to diagnosis dimension values as 
25 given by the Has tat>le from the case study, so that R ^ {("John Doe" (1), "insulin dependent 
diabetes" (9))» ("Jane Doe" (2), "Insulin dependent diabetes during pregnancy" (5)), ("Jane 
Doe" (2), "Insulin dependent diabetes" (9)), ("Jim Doe" (3). "Diabetes" (11))}. Note that facts 
may be related to values in higher-level categories. We do not require that e belongs to 
±iHagno8is' For example, the fact "John Doe" (1) is related to the diagnosis "Insulin dependent 
30 diabetes" (5), which belongs to the Diagnosis Family category. This feature will be used later 
to explicitly capture the different granularities in the data. If no diagnosis was known for patient 
"John Doe" (1 ), we would have added the pair ("John Doe" (1 ),T) to R. 

A multidimensional object (MO) is a four-tuple M = (5, F, D, R), where S^iT,V = {71}) 
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is the fact schema, F = {/} is a set of facts f where Type{f) = D = {Di, i = l, n} Is a 
set of dimensions yNhere Type{Di) = 77, and = {Ri,i = l,..,n} is a set of fact-dimension 
relations, such that Vi((/, e)ejRi^/€FA 3Cj e A(e e C7,)). 

5 Examples For the case study, we get a three-dimensional MO M = {S,F,D^R), where 
S = {Patient, {Diagnosis, Name, Residence}) and F = {"John Doe" (1), "Jane Doe" (2), "Jim 
Doe" (3)}. The definition of the diagnosis dimension and its corresponding fact-dimension 
refation was given in the previous examples. The Residence dimension has the categories 
Address (= ^Residence), City, County, and Tjje^^encc- The values of the categories are given 

10 by the corresponding tables in Table 1 . The partial order is given by the relationship tables. 
Additionally, the only value in the Tn^aence category is T, which logically contains all the 
other values in the Residence dimension. The Name dimension is simple, i.e., it just has a 
Name category (= ±Name) and a T category. We will refer to this MO as the "Patienr MO. 
A graphical illustration of the schema of the "Patient* MO is seen in Figure 2. Because some 

15 addresses map directly to counties, County is an immediate predecessor of Address. 

The facts in an MO are objects with value-independent identity. We can test facts for 
equality, but do not assume an ordering on the facts. The combination of dimensions values 
that characterise the facts of a fact set is not a "key" for the fact set. Thus, several facts may 
20 be characterised by the same combination of dimension values. But, the facts of an MO is 
a set, so an MO does not have duplicate facts. The model formally defines quite general 
concepts of dimensions and dimension hierarchies, which is ideal for the presentation of our 
techniques. The presented techniques are not limited by the choice of data model. 

25 Hierarctiy Pmperties 

In this section imporfant properties of MOs that relate to the use of pre-computed aggregates 
are defined. The properties will be used in the following sections to state exactly what prob- 
lems the proposed algorithms solve. The first important concept is summarisability, which 
intuitively means that higher-level aggregates may be obtained directly from lower-level ag- 

30 gregates. 

Definition 1 Given a type T, a set 5 = {SjJ = 1,..,A:}, where Sj € 2'^, and a function 
5 : 2^ »-> T, we say that g is summarisable for 5 if giUgiSi), -.,^(5^^)}}) = ^(^i u u 5^). The 
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argument on the left-hand side of the equation is a multiset, i.e., the same value may occur 
multiple times. 

Summarisability is important as it is a condition for the flexible use of pre-computed aggre- 
5 gates. Without summarisability, lower-level results generally cannot be directly combined into 
higher-level results. This means that we cannot choose to pre-compute only a relevant se- 
lection of the possible aggregates and then use these to (efficiently) compute higher-level ag- 
gregates on-the-fly Instead, we have to pre-compute the all the aggregate results of queries 
that we need fast answers to, while other aggregates must be computed from the base data. 
10 Space and time constraints can be prohibitive for pre-computing all results, while computing 
aggregates from base data is often inefficient. 

It has been shown that summarisability is equivalent to the aggregate function {g) be- 
ing distributive, all paths being strict, and the mappings between dimension values in the 
hierarchies being covering and onto [1 6]. These concepts are formally defined below. The 
1 5 definitions assume a dimension n = {C,<) and an MO M = (5, F, D, R). 

Definition 2 Given two categories, Ci, Ca such that € Pred(Ci), we say that the mafH^^ing 
from Ci to C2 is onto iff Ve2 € CbCBei € Ci (ei < 62)). Otherwise, it is into. If all mappings 
in a dimension are onto, we say that the dimension hierarchy is onto. Given an MO M, if all 
20 dimension hierarchies in M are onto, we say that M is onto. 

Mappings that are into typically occur when the dimension hierarchy has varying height. 
In the case study, there is no low-level cancer diagnosis, meaning that some parts of the 
hierarchy have height 2, while most have height 3. It is thus not possible to use aggregates 
25 at the Low-level Diagnosis level for computing aggregates at the two higher levels. Mappings 
that are into also occur often in organisation hierarchies. 

Definitions Given three categories. Ci, C2, and C3 such that Type{Ci) < Type{C2) < 
TypeiCs), we say that the mapping from C2 to C3 is covering with respect to Ci iff Vei e 
30 Ci (Ves G C3 (ei < 63 =^ 3e2 G C2 (ei < 62 A 62 < 63))). Othenwise. it is non-covering with 
respect to Ci. If all mappings in a dimension are covering w.r.t. any category, we say that 
the dimension hierarchy is covering, if all dimension hierarchies in the dimensions of M are 
covering and all mappings between categories in the dimensions of M are covering w.r.t. F, 
we say that M is covering. 
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Non-cx)vering mappings occur when some of the links between dimension values skip 
one or more levels and map directly to a value located higher up In the hierarchy. In the 
case study, this happens for the "1 Sandy Dunes" address, which maps directly to "Outback 
County" (there are no cities in Outback County). Thus, we cannot use aggregates at the City 
5 level for computing aggregates at the County level. 

Definition 4 Given an MO M = (S,F,D,R), and two categories Ci and C% that belong to 
the same dimension Di e D such that Type(Ci) < Type{C2), we say that the mapping from 
Ci to C2 is covering with respect to F, the set of facts, iff V/ e (Vea G C2 (/ -^^i Bei G 
10 Ci(/-^iei Aei <ie2))). 

This case is similar to the one above, but now it is the mappings between facts and di- 
mension values that may skip one or more levels and map facts directly to dimension values 
in categories above the bottom level. In the case study, the patiente can map to diagnoses 
15 anywhere in the Diagnosis dimension, not just to Low-level Diagnoses. This means that we 
cannot use aggregates at the Low-level Diagnosis Level for computing aggregates higher up 
in the hierarchy. 

Definitions Given two categories, Ci and C2 such that C2 e Prerf(Ci), we say that the 
20 mapping from Ci to C2 is strict iff Vei g Ci (Ve2,e3 € C2 (ei < e2 A ei < 63 62 ea)). 
Otherwise, it is non-strict The hierarchy in dimension D is strict if all mappings in it are 
strict; otherwise, it is non-strict. Given an MO M = {S,F,D,R) and a category Cj in some 
dimension A e we say that there is a strict patti from the set of facts F to Cj Iff V/ € 
F{f -^i ei A / -v^j 62 A ei € Cj A 62 € Cj ^ ei ^ 62). (Note that the paths to the Tr 
25 categories are always strict.) The hierarchy in dimension D is said to be aggregation strict if 
the following hold: if a mapping from a category C to a category P, both in D, is non-strict, 
then Pred{P) = 0, i.e., P has no parents in the hierarchy. 

Non-strict hierarchies occur when a dimension value has multiple parents. This occurs 
30 in the Diagnosis dimension in the case study where the "Insulin dependent diabetes during 
pregnancy" low-level diagnosis is part of both tiie "Insulin Dependent Diabetes" and the "Di- 
abetes during pregnancy" diagnosis families, which in turn both are part of the "Diabetes" 
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diagnosis group. This means that we cannot use aggregates at the Diagnosis Family level to 
compute aggregates at the Diagnosis Group level, since data for "insulin dependent diabetes 
during pregnancy" would then be counted twice. Hierarchies that are aggregation strict allow 
practical pre-aggregation to be applied because the values in the parent category has no par- 
5 ents themselves. This means that double-counting of data will not occur, since the values in 
the parent categories containing possibly overlapping data will not be added together. 

Definition 6 If the dimension hierarchy for a dimension D is onto, covering, and strict, we 
say that D is normalised. Otherwise, it is un-normalised. For an MO M = {S,D,F,R), if ail 

10 dimensions Di e D are normalised and VJJi g R ((/,e) e iii e e ±d) (i.e., all facts m^ 
to dimension values In the bottom category), we say that M is normalised. Othenvise, It is 
un-normalised. If the hierarchy in a dimension D is covering, onto, and aggregation strict, 
we say that D is aggregation normalised. If all dimensions in an MO M are aggregation 
normalised and the relationships from facts to dimensions are many-to-one and only to the 

1 5 bottom category, we say that M is aggregation normalised. 

For normalised hierarchies and MOs, all mappings are summarisable, meaning that we 
can pre-aggregate values at any combination of dimension levels and safely re-use the pre- 
aggregated values to compute higher-level aggregate results. Thus, we want to normalise the 
20 dimension hierarchies and MOs for which we want to apply practical pre-aggregation. 

We proceed to describe how the normalisation of the dimension hierarchies and MOs 
used for aggregation is achieved. We first show how to perform transformations on dimension 
hierarchies, then later describe how the same techniques may be applied to eliminate the 
non-summarisable properties of fact-dimension relations. 

Dimension Transformation Teciiniques 

This section describes how dimensions can be transformed to achieve summarisability. Trans- 
forming dimensions on their own, separately from the facts, results in well-behaved dimen- 
sions that can be applied in a number of different systems or sold to third-party users. The 
30 transformation of the dimension hierarchies is a three-step operation. First, all mappings are 
transformed to be covering, by introducing extra "intermediate" values. Second, ail mappings 
are transformed to be onto, by introducing "placeholder" values at lower levels for values with- 
out any children. Third, mappings are made strict, by "fusing" values together. The three steps 
are treated in separate sections. None of the algorithms introduce any non-summarisable 
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properties, so applying each once is sufiicienL | 

In general, the aigorlthms take as input a set off tables Ac„c, that specifies the mapping 
from dimension values in category Ci to values in category ft. The input needs not contain 
all pairs of ancestors and descendants — only direct parsnt-child relationships are required. 
If there are non-covering mappings in the hierarchy, v« » 
{F,H} C PtvdlC) and T&pe{P) < IVpe(JI). In this case, the input must also contain Rp^ff 
tables that map P values to H values. 

The algorithms are expressed using recursion. They ^uld also easily be expressed using 
Iteration instead. 



Non-Covering HierarcNes 

The first algorithm renders all mappings in a dimension 
When a dimension value is mapped dtedfy to another va lue 
immediately above it in the hierarchy, a new intermedial 
immediately at>ove, and the two original dimension values 
than to each other. 



30 



hierarchy covering w.rj. any category, 
in a category higher than the one 
3 value is Inserted-into the category • 
are linked to this neMf value; rather 



Example 6 In the hierarchy for the Residence dimensiop. 
to County The address 'IZS Rural Road" (52) is in "1 
dty. and the address "1 Sandy Dunes" (53) is in "Outbade 
any cities at all. The algorithm inserts two new dimens 
and C32, which represent Melbourne and Outback 
their respective counties. The addresses "123 Rural 
finked to C31 and C32, respectively. This occurs in the 
(on the Address category: the procedure is given bel6w) 
recursively on the City, County, and T categories, nolhing 
already covering. The transformation is illustrated grappicaliy 
show the t>roblematlc'' links* and the bold-face values 
values and links. 



, two links go from Address directly 
l^elboume County" (31), but not in a 
Count/* (32), which does nof ha^ 
ion values in the City category, C31 
coijinty, respectively, and links them to 
and "1 Sandy Dunes' are then 
first call of procedure MakeCovering 
When MakeCovering is called 



happens, a^^ll mappings are 
in Figure 3, The dotted lines 
anld thick lines show the new dirhens'ion 



I imme< fiate 



In the algorithm, C is a chiW category, P Is an 
category, X are the non-covering links from CloH^ and 
in X. The M operator denotes natural join. The algorithfn 
ment category C (initially the bottom category) in line (1 1 



potent category. IT is a 'higher 
N are the ''higher" cfimension values 
worics as follows. Given the argu* 
I, the algorithms goes through all C*s 
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immediate parent categories P (2). For each immediate parent category P, it looks for prede- 
cessor categories HoiC that are "higher" in the hierarchy than P (4). if such an H exist, there 
might t>e linl^s in the mapping from C\o H that are not available by going through P. Line (6) 
finds these "non-covered" links, i, in the mapping from C7 to i? by "subtracting" the links that 

5 are available by going through P from all the links in the mapping from C\o H. Line (7) uses 
L to find the dimension values N \r\ H that participate In the "non-covered" mappings. For 
each value in iV, line (8) inserts a corresponding marked value into P\ these marked values 
represent the N values in P. The marked values in P are then linked to the original values in 
H (9) and C (10). Line (12) changes the schema so that H is no longer a predecessor of C. 

10 This can be done safely as the intermediate values and links means that the values in H and 
C that were connected before are still connected by going through P. Line (1 3) contains a re- 
cursive call to the algorithm P, thus fixing mappings higher up in the hierarchy. The algorithm 
terminates when it reaches the T category, which has no predecessors. 



(1 ) procedure MakeCovering(C) 

(2) for each P e ImmPred{C) do 

(3) begin 

(4) for each H € Pred{C) where Type{H) > Type{P) do 

(5) begin 

(6) L ^ Rc,H \ ncM^,p ^ ^p.^) 

(7) nH(i) 

(8) P ^ P U {Mark{h) \ hGN} 

(9) Rp^H ^ Rp,H U {{Mark{h), h)\heN} 

(1 0) Rc^ Rcj> U {(c, Markih)) | (c, /i) € L} 

(11) end 

(1 2) Pred(C) ^ Pred{C) \ {H} 

(1 3) MakeCovering(P) 

(14) end 

(15) end 



All Steps in the algorithm are expressed using standard relational algebra operators. The 
30 general worst-case complexity of join is O(n^), where n is the size of the input. However, be- 
cause the input to the algorithm are hierarchy definitions, the complexity of the join in the algo- 
rithm will only be C?(n log n). Thus, all the operators used can be evaluated in time O(nlogn), 
where n is the size of the input. The Mark operation can be performed in 0{1) time. The 
Inner loop of the algorithm is evaluated at most once for each link between categories, i.e., at 
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most l^jl times, where k is tlie number af categories (if all categories are directly linked to 
all others). Thus, the overall big-O complexity of the algorithm is OCfc^nlogn), where k is the 
number of categories and n is the size of the largest participating Rc^.ci relation. The worst- 
case complexity will not apply very often; in most cases, the inner loop will only be evaluated 
5 at most k times. 

The algorithm inserts new values into the P category to ensure that the mappings from 
P to higher categories are summarisable, i.e., that pre-aggregated results for P can be di- 
rectly combined into higher-level aggregate results. The new values in P mean that the cost 
of materialising aggregate results for P is higher for the transformed hierarchy than for the 

1 0 original. However, if the hierarchy was not transformed to achieve summarisability, we would 
have to materialise aggregates for G, and perhaps also for higher level categories. At most 
one new value is inserted into P for every value in G, meaning that the extra cost of material- 
ising results for P is never greater than the cost of the (otherwise necessary) materialisation 
of results for G. This is a very unlikely worst-case scenario — in the most common cases, the 

15 extra cost for P will be much lower than the the cost of materialising results for G, and the 
savings will be even greater because materialisation of results for higher-level categories may 
also be avoided. 

The correctness argument for the algorithm has two aspecte. First, the mappings in the hi- 
erarchy should be covering upon termination. Second, the algorithm should only make trans- 

20 formations that are semantically correct, i.e. , we should get the same resulte when computing 
results with the new hierarchy as with the old. The correctness follows from Theorem 1 and 2, 
below. As new values are inserted in the P category, we will get aggregate values for both 
the new and the original values when "grouping" by P. Results for the original values will be 
the same as before, so the original result set is a subset of the result set obtained with the 

25 transformed hierarchy. 

Theorem 1 Algorithm MakeCovering terminates and the hierarchy for the resulting dimension 
ly is covering. 

30 Proof: By induction in the height of the dimension lattice. Base: The height is 0, making 
the statement trivially true. Induction Step: We assume the statement is true for dimension 
lattices of height n, and consider lattices of height n + 1. For termination, we note that there is 
a finite number of {P,H) pairs, all operations in the inner loop terminate, and the algorithm is 
called recursively on P, which is the root of a lattice of height n. For the covering property, we 
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note that the insertion of intermediate, marl^ecl values into P means that the mapping from P 
to Jf is covering w.r.t C- By the induction hypothesis, the mappings higher in the hierarchy 
are fixed by the recursive call of the algorithm. 

5 Theorem 2 Given dimensions D and i?' such that D' is the result of running MakeCovering 
on D, an aggregate result obtained using U is a subset of the result obtained using 

Proof : Follows easily from Lemma 1 , next, as the Inserted values are "internal" in the hierar- 
chy. 

10 

Lemma 1 For the dimension Z>' = (C, <') resulting from applying algorithm MakeCovering 
to dimension D = (C,<), the following holds: Vei,e2 e D (ei <' e2 <=> ei < €2) (there is a 
path between any two original dimension values in the new dimension hierarchy Iff there was 
a path between them In the original hierarchy). 

15 

Proof: By induction in the height of the dimension lattice. Base: The height is 0 making 
the statement trivially true. Induction Step: We assume the statement is true for dimension 
lattices of height n, and consider lattices of height n + 1. Examing the inner loop, we see that 
the insertion of intermediate values into P, and the linking of values in C and H to these, only 
20 links values in C and H that were linked before. No links or values are destroyed by the inner 
loop. Thus, the statement is true for the links from C to P, and from C to H. By the induction 
hypothesis, the statement holds true for the transformations made by the recursive call on P. 

We see that the original values in the hierarchy are still linked to exactly the same original 
25 values as before, as stated by Lemma 1 , although new values might have been inserted in- 
between the original values. Thus, when evaluating a query using the transformed hierarchy, 
the results for the original values will be the same as when using the original hierarchy. 

Assuming only the original result set is desired, results for the new values must be ex- 
cluded, which is easy to accomplish. The new, "internal" values are marked with "mark=in- 
30 ternal", whereas the original values have "mark=original". In order to exclude the new, in- 
ternal values from the result set, the equivalent of an SQL HAVING clause condition of 
"mark=originar* is introduced into the original query. 
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Non-Onto Hierarchies 

The second algorithm renders all mappings in hierarchies 
non-bottom categories have children. This is ensured k>y 
categories to represent the childless values. These 
values, making it possible to map facts to the nevy placeholder values 
values. Tbis makes it possit)le to only map facts to the 
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onto, Le.. all dimension values in 
nserting placeholder values in lower 
are marked with the original 
instead of to the original 
category. ... 



new values 



txittom 



Example 7 In the Diagnosis dimension, the tung cancer diagnosis family (ID » 14) has no 
children. When the algorithm reaches the Diagnosis Fanuly category. It inserts a placehoWer 
value {L14) into the Low-level Diagnosis category, representing the 'Lung cancer diagnosis, 
and links it to the original value. Facts mapped to the "Lur g cancer value may then instead be 

-level 

Diagnosis Category. A graphical illustration of the tran^ormation Is seen , i h n g urcy .: The 
bold-faced'L14 value is the new value inserted; and the |hick fine between'14'and L14'is the- 
new link Inserted. 



Ell ( 



) values 



In the algorithm below, JP is sl parent category, C is 
parent values with no chiMren. The algorithm works as 
the T category) in fine (1), the algorithm goes through 
descendants of P (2). For each C, line (4) finds the 
tjy 'subtracting" the values with children in C from the 
In N, fines (5) and (6), respectively; Insert Into C a 
value, and links the new value to the original. MakeOnt^ 
The algorithms terminates when It reaches the 



(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 
(8) 
(9) 



. category, 



procedure MakeOnto(P) 
for each C e J>eac(P) do 
begin 

C 4- C U {Marik(p) I p 6 //} 
Rcj> ^ JZCF U {(Afor*(p).p) \peN} 

MakeOnto(C) 

end 



a child category, and holds the 
follows. Given a category F (initialiy 
categories C that are (immediate) 
N \n-P that have no children in C, 
va[lues in P. For each ^childless* value 
marked with the parent 
is then called recursively on C (7). 
\ which has no descendants. 



place hoUer value i 



end 



7 
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Following the reasoning in the previous section, we find that the overall big-O complexity is 
0(k^nlogn), where k is the number of categories and n is the size of the largest participating 
Rci .C2 relation. However, the complexity will only be 0{kn log n) for the most common cases. 
The MakeOnto algorithm inserts new values into C to ensure that the mapping from C 

5 to P is summarisable. Again, this means that the cost of materialising results for C will be 
higher for the transformed hierarchy than for the original. However, if the new values were not 
inserted, we would have to materialise results for P, and perhaps also higher categories, as 
well as £7. At most one value is inserted In C for every value in P, meaning that the extra 
cost for C is never greater than the cost of materialising results for P. As before, this is a very 

10 unrealistic scenario, as it corresponds to the case where no values in P have children in C In 
most cases, the extra cost for C will be a small percentage of the cost of materialising results 
for P, and the potential savings will be even greater, because pre-aggregation for higher-level 
categories may t>e avoided. 

As before, the correctness argument for the algorithm has two aspects. First, the map- 

15 pings in the hierarchy should be onto upon termination. Second, the algorithm should only 
make transformations that are semantically correct. The correctness follows from Theo- 
rems 3 and 4, below. Again, the result set for the original values obtained using the original 
hierarchy will be a subset of the result set obtained using the transformed hierarchy. The 
results for the new values can be excluded from the result set by adding a HAVING clause 

20 condition. 

Theorem 3 Algorithm MakeOnto terminates and the hierarchy for the resulting dimension D' 
is onto. 

25 Proof: By induction in the height of the dimension lattice. Base: The height is 0, making 
the statement trivially true. Induction Step: We assume the statement is true for dimension 
lattices of height n, then consider lattices of height n -h 1. For termination, we note that there 
is a finite number of descendants C for each P, that all operations in the loop terminate, and 
that the algorithm is called recursively on C, which is the top element in a lattice of height 

30 n. For the onto property, we note that the insertion of placeholder values into C makes the 
mapping from C to P onto. By the induction hypothesis, the mappings further down in the 
lattice are handled by the recursive call. 
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Theorem 4 Given dimensions D and X)' sucli that £>' is ttie result of applying algorithm 
MakeOnto to D, an aggregate result obtained using JD is a sut)set of the result obtained 
using J^. 

5 Proof: Follows easily from the observation that "childless" dimension values are linked to 
new, placeholder values in lower categories in one-to-one relationships, meaning that data for 
childless values will sUII be counted exactly once in aggregate computations that use the new 
dimension. 

10 Non-Strict Hierarchies 

The third algorithm renders mappings in hierarchies strict, meaning that problems of "double- 
counting" will not occur. Non-strict hierarchies occur when one dimension value has several 
parent values. 

The basic idea is to "fuse" a set of parent values into one "fused" value, then link the child 
1 5 value to this new value instead. The fused values are inserted into a new category in-between 
the child and parent categories. Data for the new fused category may safely be re-used for 
computation of higher-level aggregate results, as the hierarchy leading up to the new category 
is strict. 

The fused value is also linked to the relevant parent values. This mapping is by nature 
20 non-strict, but this non-strictness is not a problem, as we prevent aggregate results for the 
parent category from being re-used higher up in the hierarchy. This is done by "unlinking" the 
parent category from its predecessor categories. 

The categories higher up are instead reached through the fused category This means 
that we can still get results for any original category, while being able to apply practical pre- 
25 aggregation throughout the hierarchy. In pre-aggregation terms, the "unlinking" of the par- 
ent categories means that we must prevent resulte for including this category from being 
materialised— only "safe" categories may be materialised. This should be given as a con- 
straint to the pre-aggregation system that chooses which levels of aggregation to materialise. 
We note that the algor'rthm does not introduce more levels in the hierarchy, only more 
30 categories, and that the number of "safe" categories in the result is the same as the number 
of original categories. This means that the complexity of the task of selecting the optimal 
aggregation levels to materialise is unaffected by the algorithm. 

Example 8 The result of running the algorittim on tiie Diagnosis dimension is seen in Fig- 
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ureX Because of the non-strictness in the mapping fro n Low-level Diagnosis to Diagnosis 
Family, and from Diagnosis Family to Diagnosis Group, Iw o new category types and the corre- 
sponding categories are introduced. The third picture indi[:ates the argument lo the algorithm; 
and, in addition, its dotted lines indicate the linlcs deleted I _ - - 

5 gives the result of applying the algorithm; here, the tkold face values and thick fines Indicate 
the values and links inserted by the algorithm. 

In the first call of the algorithm the three Low-level biagnosis valuer— "(low-^evel) Lung 
cancer (LI 4); "Insulin dependent diabetes during pregna ic/" (5); and TMon insulin dependent 
diabetes during pregnancy** (6)— are finked to the three new fused values— "(low-level) Lung 
10 cancer (14); "Diabetes during pregnanty. Insulin depen ient diabetes** (4, 9); and "Diabetes 
during pregnancy. Non insulin dependent cfiabetes" (4, 10)— and these are in turn linked to 
"Lung Cancer (14); -Diabetes during pregnancy" (4); -IiisuHn dependent diabetes* (9); and 
Tslon insulin dependent diabetes" (10). The these latter four values in the Diagnosis Family 
category are un-linked from their parents, as the Diagnoi ^is Family category is -unsafe." 

1-15 When called recursively on the Set-of Diagnosis Famil y category, the algorithm creates the" 

new fused values "Cancer (13) and "Diabetes, Other pregnancy related diseases* (11,12) 
in the Set-of Diagnosis Group category. These new valu bs are linked to the values "Carw»r 
(13). "Diabetes- (11), and -Other pregnancy related diseases'* (12) In the Diagnosis Group 
category, and to the T value; and the values in the Diac nosis Group category are un-Pinked 
20 from their parents. Note the Importance off having a T value: the values not linked to T are 
racaclly the unsafe values, for Which aggregate results sh 3u!d not be re-used. 



The algorithm assumes that all paths in the dimensi 



25 



m hierarchy have equal length. I.e., 
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all direct links are from children to their immediate paren ts. This is ensured k)y the MakeCov- 
ering and MakeOnto algorithms. In the algorithm below, C Is a cNki category, P is a parent 
category, 6 is a grandparent category, i\r is the new category introduced to hold the "f used- 
values, and 14 denotes natural Join. 

The algorithm takes a category C (initially the j. category) as input I then goes through 
the set of immecfiate parent categories P of C (line (2)). L ine (4) tests if there is non-strictness 
in the mapping from C to P and if P has any parents (4) 
as aggregate results for P can either be safely re-used oi i 
the algorithm in then invoked recur5lvely,.in line (20). 

If the test succeeds, the algorithm creates a new fijsed category. Rrst a new, empty 
category N with domain 2^* is created in line (G). Th j values inserted into this category 
represent sets of values off P. For example, the value "* , 



If this test fails, there is no prot>lem 
are guaranteed not be re-used; and 



, 2r represents the set consisting off 
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precisely 1, 2. Values in C are then linked to to new, fused values, representing their particular 
combination of parents in P (7). The new values are constructed using a Fuse function, that 
creates a distinct value for each combination of P values and stores the corresponding P 
values along with it. 

The resulting links are used in line (8) to insert the fused values into their category N, 
and an "Unfuse" function, mapping fused values from N into the corresponding P values, is 
used in line (9) to map the values in N to those in P. In line (10). N is included in, and P is 
excluded from, the sets of predecessors of C. The set of predecessors of N is set to P in line 
(11 ), meaning that the new category N resides in-between C and P in the hierarchy. 



(1^ 


procedure MeikeStrict (C) 


(2) 


for each P e PTed(C) do 


(3) 


begin 


(4) 


if (3ei e C (3e2, 63 € P (ei < 62 A ei < es A 62 9^ 63))) 




A Pred{P) f£0 then 


(5) 


begin 


(6) 


N CreateCategory(2^) 


(7) 


Rc,N {{ei,Fuse({e-t \ (61,62) € Rc,f}))} 


(8) 




(9) 


Rn,p •«- {(ei, 62) 1 ei e JV A 62 € Unjuae{ei)} 


(10) 


Pred(C) -h- Pred{C) U {N} \ {P} 


(11) 


PndiN) •{- {P} 


(12) 


for each G g PTed{P) do 


(13) 


begin 


(14) 




(15) 


Pred{N) <- Pred{N) U {G} 


(16) 


Pred{P) Pred{P) \ {G} 


(17) 


end 


(18) 


MakeStrict(J>0 


(19) 


end 


(20) 


eise MakeStiict(P) 


(21) 


end 


(22) 


end 



For each grandparent category G, the algorithm links values in N to values in G, in line 
(14), includes G in the predecessors of iST, in line (15), and excludes G from the predecessors 
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of P, in line (16), thereby also deleting the links from PXoG from the hierarchy. The exclusion 
of the G categories from the predecessors of P means that aggregate results for P will not be 
re-used to compute results for the G categories. 

In the end, the algorithm is called recursively on the new category, N. Note that the test 
5 for Pred{P) ^ 0 in line (4) ensures that the mapping from NXoP will not be altered, as P now 
has no predecessors. 

Following the reasoning in the previous sections, we find that the overall big-O complexity 
is C?(fwfclogwlogifc), where p is the number of immediate parent and children categories in 
the dimension type lattice, n is the size of the largest mapping in the hierarchy, and k is the 

10 maximum number of values fused together. For most realistic scenarios, p and k are small 
constants, yielding a low O(nlogn) complexity for the algorithm. 

The MakeStrict algorithm constructs a new category N and insert fused values in N to 
achieve summarisability for the mapping from N to P, and from N to G. The algorithm only 
inserts the fused values for the combinations that are actually present in the mapping from 

15 C to P. This means that the cost of materialising results for N is never higher than the cost 
of materialising results for C. This is a worst-case scenario, for the most common cases the 
cost of materialising results for N will be be close to the cost of materialising results for P. 
However, without the introduction of N, we would have to materialise results not only for P. 
but also for G and all higher-level categories. Thus, the potential savings in materialisation 

20 costs are very high indeed. 

Considering correctness, the mappings in the hierarchy should be strict upon termina- 
tion, and the algorithm should only make transformations that are semantically correct. More 
specifically, it is acceptable that some mappings be non-strict, namely the ones from the new, 
fused categories to the unsafe parent categories. This is so because unsafe categories do 

25 not have predecessors in the resulting hierarchy, meaning that aggregate results for these 
categories will not be re-used. 

The correctness follows from Theorems 5 and 6, below. When evaluating queries we get 
the same result for original values as when evaluating on the old hierarchy. The values that 
are deleted by the algorithm were not linked to any facts, meaning that these values did not 

30 contribute to the resulte in the original hierarchy. As all the new values are inserted into new 
categories that are unknown to the user, the aggregate result obtained will be the same for 
the original and transformed hierarchy. Thus, we do not need to modify the original query. 
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Theorem 5 Let D' be the dimension resulting from applying algorithm MakeStrict on dimen- 
sion D. Then the following hold: Algorithm MakeStrict terminates and the hierarchy for the 
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dimension D", obtained by removing unsafe categories from JD', is strict- 
Proof: By induction in the height of the dimension lattice. Base: The height Is 0, making 
the statement trivially true. Induction Step: Assuming that the statement is true for lattices 

5 of height n, lattices of height n + 1 are considered. All steps in the algorithm terminate, and 
the algorithm Is called recursively on either P (in the strict case) or N (in the non-strict case), 
both of which are the root of a lattice of height n, thus guaranteeing termination. 

For the strictness property, there are three cases. If the mapping from C to P is already 
strict, this mapping is not changed, and by the induction hypothesis, the statement holds for 

10 the recursive call on P. If the mapping from C to P is non-strict, but P does not have any 
parents, strictness is ensured, as P is excluded from D". If the mapping is non-strict and 
P has parents, the resulting mapping from C to is strict. By the induction hypothesis, the 
statement holds true for the recursive call on JV, as the introduction of N has not increased 
the height of the lattice. 

15 

Theorem 6 Given dimensions D and D' such that D' is the result of applying algorithm 
MakeStrict to D, an aggregate obtained using D' is the same as that obtained using D. 

Proof: Follows from Lemma 2, as all facts are mapped to values in the J. category, which is a 
20 safe category. Thus, there will be a path from a fact / to an original dimension value e Iff there 
was one in the original hierarchy, meaning that aggregate results computed using the original 
and the new hierarchy will be same. 

Lemma 2 For the dimension I?' = (C, <') resulting from applying algorithm MakeStrict to 
25 dimension D = (C7,<), the following holds. Vei,e2 e (ei G Ci A Safe(Ci) A ei <' e2 ^ 
ei < 62) (there is a path between an original dimension value in a safe category and any other 
original dimension value in the new dimension hierarchy iff there was a path between them in 
the original hierarchy). 

30 Proof: By induction in the height of the dimension lattice. Base: The height of the lattice is 0, 
making the statement trivially true. Induction Step: If either the mapping from C to P is strict, 
or P does not have any parents, the algorithm does not change the mappings, and by the 
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indictton hypothesis, the statement is true for the recursive call on P. Othennrise, we observe 
that the creation of fused values in N, and the linking of C, P, and G values to these, only 
links exactly the values in C and P, or C and G, that were linked before. Because P Is not 
safe, the links from P to G may be deleted. By the induction hypothesis, the statement is true 
5 for the recursive call on N. 

In the method described above, both the original and the new categories are kept in the 
same hierarchy. An alternative would be to keep the "unsafe" categories and the mappings to 
them In a separate hierarchy, so that only "safe" categories are kept in the main hierarchy. 

1 0 Fact-Dimension Transformation Techniques 

This section explains how the set of algorithms from the section "Dimension Transformation 
Techniques" may also be applied to the relationships between facts and dimensions, thus 
providing a basis for enabling practical pre-aggregation on concrete MOs that Include fact 
data. 

1 5 The basic idea is to view the set of facts F as the bottom granularity in the lattice. The input 
to the algorithms then consists of the facts, P, the Rf,c tables, describing the mappings from 
facts to dimension values, and the C and RcuC^ tables, describing the dimension categories 
and the mappings between them. 

Only the covering and strictness properties are considered because for the fact-dimension 

20 relationships, a mapping between facts and dimension values that is into means that not 
all dimension values in the bottom category have associated facts, which does not affect 
summarisability. As before, we first apply the MakeCovering algorithm, then the MakeStrict 
algorithm. 

The computational complexity of the algorithms will now be dominated by the size, n, of 
25 the mapping between facts and dimension values, i.e., the complexity will be 0(nlogn) if we 
assume the height of the lattice and the maximum number of values fused together to be 
small constants. This means that the algorithms can be applied to even very large databases. 

Mixed Granularity Mappings 

30 The first case to consider is the one where some of the mappings are non-covering w.rt. the 
facts, meaning that not all facts can be reached through these mappings and thus resulting in 
these facts not being accounted for in aggregate computations. This occurs when some facts 
are mapped directly to dimension values in categories higher than the X category, i.e., the 
facts are mapped to values of mixed granularities. 

35 We use the MakeCovering algorithm to make the mappings covering, initially calling it on 
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F, which is now the bottom of the lattice. The algorithm 
the facts by inserting new marked values, representing 
categories, and linking the facts to the new values 
section "Non-Covering Hierarchies," the marked values 
values, so that vtfhen new fact-dimension mappings are 
go cfffBctfy to the original parent values now instead can 
the ± category. 
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makes the mappungs covering w.r.t. 
th 3 parent values, in the intermediate 
insl ead of the parent values. As in the 
k^ep information about their original 
ai tded, the links that are supposed to 
t>e set to go to ttie marked value in 



I Patients 



o 



Dee' 



Example 9 In the case study, the mapping between 
ularity: 'John Doe* (1) and -Jane Doe" are both mapped 
pendent diabetes" (9), "Jane Doe" is additionally mappec^to 
dependent diabetes during pregnanqT (5), and "Jim 
Diagnosis Group. - . ~ . 

In the first-caH of the algorithm, two new Low-level 

senting 'Insulin dependent diabetes," and "LH; 
mapped to these instead of the original values. In the 
sis. an "Fll" value representing "Diabetes" at the DiagnG|s]s 
IDiabetes" and value "L1 1 ." S 

The transformations are Illustrated in Rgure^6-and/, 
are deleted by the algorithm and bokl-face value and thid^ lines 
links inserted by the algorithm. 



represc nting 



and Diagnoses is of mixed gran- 
the Diagnosis Family, "Insulin de- 
the Low-leyel DIagnods, "Insulin 
is mapped to "D'^etes" (11); a 



cjiagnoses are-insertedi --L9," repre- • 
"Diabetes"; and the facts are 
ecursive call on Low-level Dlagno* 
Family level Is inserted.b8tween 



where dotted lines indicate links that 
indicate dimension values and 



Many-To-Many Relationships 
The second case occurs when relationships between fadts and dimension values are many- 

25 to-many. This means that the hierarchy, with the facts as the bottom category, is non-strict, 
leading to possible doutrte-counting of tads. It is enough to make the hierarchy partly strict, 
as desalbed in ttie section "Non-Strict Hierarchies." The 1 4akeStrict algorithm is initially called 
on F, which is now the lx>ttom of the hierarchy lattice. B ^cause the MakeCovering algorithm 
has already been applied, all pattis from facts to the T value have equal length, as required 

30 by the MakeStrict algorittim. 

Some dimension values have no facts mapped to then i, leading to an interesting side effect 
of the algorithm. When the algorithm fiises values and places the fused values in-between 
the original values, it also deletes the child-to-panent and parent-to-grandparent links. The 
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fact*le5S dimension values are then left disconnected frof^ the rest of the hierarchy, with no 
links to other values. 

These bet-less dimension values do not contribute tc 
are thus superfluous. To minimise the dimensions, an 
deletes the fact-less dimension values by traversing the 
invoked in a postprocessing step. For a hierarchy of 
O(Jbnlogn), where n is the size of the mapping betweei 
overall computational complexi^ is not altered. 



any aggregate computations and 
I^elete-unconnected" algorithm that 
hierarchy starting at the facts is 
Jb, this can be done in time 
facts and dimensions. Thus, the 



height 



1 0 Example 1 0 The relationship t>etween patients and diagnoses 
the MO was transformed so that all mappings were 
MakeStricl is apf^ied to this 1^0. The final result of tte 
in Figure ;?^Nfelues in italics, e.g., L14. and dotted lines 

J irBpJd-tace.values.and^^^ lines denote values and links ir 



, vakes 



I1BW1 



15 Three new categories are introduced: "Set-of Low 
Family; and "Set-of Diagnosis Group," as non-strictness 
are inserted into these fused categories. For example. 
"Insulin dependent diabetes during pregnancy, (low-level) 
and "(low-level) Insulin dependent diabetes" (L9) are inseijted 

20 nosis** category: and the oiiglnal values are.linked to the 

Values "(low-level) Lung cancer (L14)» "Lung cancerT 
dependent diabetes during pregnane/* (6). and "Non insu 
Characterise any ^ts and are deleted by "Delete-unconn^ted* 



25 Architectural Context 
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is many-to-many. InExampleS, 
covering, as seen in Figure JS^, algorithm 



ajppfication of the algorithm is seen 
indicate deleted values and links, 
serted by the algorithm.. . . 
l 3fvel Diagnosis," "Set-of Diagnosis 
occurs at all levels. Fused values 
*(low-level) Lung Cancer (LI 4), 
Insulin dependent diabetes*^ (5, L9), 
into the "Set-of Low-level Diag- 
values. 

(14)» -Cancer (13), 'Non-insulin 
in dependent diabetes* (10) do not 



OvBrview 

The overall idea presented in this paper is to take un-normMsed 
• normalised MOs that are well supported by the practical 
30 in cuirent OLAP systems. Queries are then evaluated on 
still want the users to see only the original K/IOs, as they 
domain. This prompts the need for means of handling 



I both 



MOs and transform them into 
aggregation techniques available 
ihe transformed f^Os. However, we 
the users' understanding of the 
the origined arxi the transformed 
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ie separation of the fronl-end pre- 
srn OLAP applications consist of an 
^ server that manages the data and 
r using a standardised appfication 
kr OLAP [17] or the OLAP Coundrs 



MOs. This section explores this coexistence. 

A current trend in commercial OLAP technology is 
sentation layer from the back-end database server 
OLAP dienl that handles the user interlace and an OLAf 
processes queries. The client communicates with the ser 
programming interface (API), e.g.. Microsoil's OLE DB 1 
MDAPI [20]. The architecture of such a system is given tb the left in Figure 8. 

This s^aration of dient and server facilitates our desire to have the user see the orig- 
inal MO whne queries are evaluated against the transfo med MO. Studies have shown that 
queries on a data warehouse consist of 60% navigatiOi ia/ queries that explore the dimer>- 
sion hierarchies and 20% aggregation queries that suminartse the data at various levels of 
detail [14]. Examples of navigational and aggregaton qjeries are "Show me the Low-Level 
Diagnoses contained in the Insulin-Dependent Diabetes I >iagnosis Family" and "Show me the 
count of patients, grouped by Diagnosis Fannilyr respectively. The navigational queries must * 
be performed on the ordinal MO, vvtiilelhe aggregation queries must be performed on the 
iranstomed MO- This is acMeved by introducing an exira "Ouery Handler* component be- 
tween the dient and the server The OLAP dient send; a query to the query handler, the 
primary task of which is to determine whether the quer) is a navigational query (internal to 
a dimension) or an aggregation query (invohring the ^ct Na\ngatlonal queries are passed 
to one OLAP server that handles the original (navigational) data, while aggregation queries 
are passed to another OLAP server that manages the transformed (aggregation) data. This 
extended system architecture is seen to the right in Rgur e K ^ 

The OLAP server for navigation data needs to suppo t dimension hierarchies which have 
non-summarisabte properties, a requirement not yet supf orted by many commercial systems 
today. However, relational OLAP systems using snow-f 
port this type of hierarchies, as are some other OLAP srstems. e.g., Hyperion (Arbor) Ess- 
base [12], If tiie Ol^P system available does not have s jfficientiy flexible hierarchy support^ 
one solution is to txiild a special-purpose OLAP server tiiat conforms to the given API. This 
task is not as daunting as it may seem at first because c nly navigationai queries need to be 
supported, meaning that multidimensional queries can bs 
queries. 

We note that the only data needed to answer navigational queries is the hierarchy defi- 
nitions. Thus, we only need to store the fact data (facts 
model) once, in the aggregational data, meaning that Oic 



35 sfightiy larger than storing just the aggregational data Navigational queries are evaluated on 



translated into simple SOL "lookup* 



and lad-dimension relations. In our 
overall storage requirement is only 
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the original hierarchy definitions and do not need to be 

As described in the section "Dimension Transformali an 
need to be re-written slightly by add^g an extra HAVlNt i 
for the new values inserted by tt>e transformation algorijhms. 
matically by the query handier, giving total transparency 
HAVING clause conditions are only necessary lor the 
they can also be applied to hierarchies transformed to i 
but simplifies the query rewriling. The new values can 
WHERE clause, by performing an inner join with a taWii 
or by using nested SELECT statements as described in 
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r^-written t>y the query hancfler. 
Techniques; aggregation queries 
clause condition to exclude results 
This can easily be done auto- 
for the user. Even though the added 
covering and onto trartsformations, 
chieve strictness; this has no effect, 
also be filtered out using a modified 
containing only the original values, 
the next section. 



Concrete Implementation 

We now show how the abstract architecture described alpove can be implemented using stan- 
dard relational database technology. 

The transparency is achieved by working with two ' 
chy and k>y using a query rewrite rnechanlsm. This is i 
The overall system architecture is seen in Figure^. ^ 
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Table 2: DDiagnosis DImennon Table 



VE rsions i 



; of each user-specified hierar* 
; d4scrit>ed In detail later in this section. 



The ROl-AP client tool, in this case the ROLAP tool Synchrony, which originated from 
Kimbairs Startracker tool 1141. makes SQL requests hi the ROLAP database, in this case 
the Oracles RDBMS, using the ODBC standani. We 
transforming ODBC driver (QTOD) that, based on cai;e-spedfic metadata, transforms the 
SQL requests into requests that hide the transformatior s from the users, returning the query 
results that the user would expect based on the original hierarchies. A transformed request is 
submitted to the OLAP DB using an RDBMS-spedfic O >BC driver. The QTOD component is 
common to aO RDBMSs. so OracleS may t>e replaced b/ another RDBMS such as IBM DB2. 
Informix, or MS SQL Server Another ROLAP tool msy also be used, making the solution 
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quite general and flexible. 

We have chosen to base the prototype on an RDBMS (OracleS) since RDBMSs are the 
most commonly used platform for Data Warehouse and OLAP applications. Additionally, the 
major RDBMSs now, like dedicated multidimensional DBMSes (MDDBs), use pre-aggregated 

5 data for faster query responses [30]. However, the approach could also be implemented using 
multidimensional technology, e.g., based on the Microsoft OLE DB for OLAP standard [17], 

The transformation algorithms are implemented in Oracle's PUSOL programming lan- 
guage. The transformations are relatively fast, taking at most a few minutes, even for large 
dimensions. Once the dimension hierarchies have been transformed, the QTOD transforms 

10 queries and results between the original and transformed hierarchies. The QTOD is a thin 
layer and adds very little overhead to queries. It is implemented using GNU Rex++/Bison++ 
scanner/parser generators and the MS Visual C++ compiler. 

The two types of queries, navigation queries and aggregation queries, are treated differ- 
ently to give the user the illusion that the dimension hierarchies have their original form. 

15 The multidimensional data is captured in a star schema [14]. The dimension table for the 
Diagnosis dimension is given in Table 2, which has one column for the low-level diagnosis ID 
in addition to columns for the textual descriptions of low-level diagnoses, diagnosis families, 
and diagnosis groups. 

The hierarchy captured in the table is partially normalised, i.e., placeholder values have 
20 been introduced to balance the hierarchy (but it remains non-strict). Specifically, the 
"ILowlevellLung Cancer*' placeholder value has been inserted into the Low-level Diagnosis 
level. We prefix such values with a "!" and their level to indicate that they are inserted by the 
transformation process. Note the multiple occurrences of lower-level values caused by the 
non-strictness of flie hierarchy. This is tiie table that will be used for user navigation in the 
25 hierarchy. Its name is prefixed with a "D" to distinguish it from another "Diagnosis" dimension 
table (described below), to be used for aggregation queries. 

We now describe how to achieve transformation transparency for navigational queries. 
The query below retrieves all low-level diagnosis names. 

SELECT DISTINCT Lowlevel 
FROM Diagnosis 

Navigational queries issued by ROLAP tools have exactiy this format. The query is trans- 
formed by tiie QTOD into the query seen next, which operates against the table DDiagnosis. 
The ti^ansformed query returns the result seen in Table 3. 
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Lowlevel 




Insulin dependent diabetes durini 


} pregnancy 


Non insulin dependent diabetes dui 


ing pregnancy 



Table 3: Navigational Query Result 



achieve 



Due to the use of distinct as a quantifier, duplicat ss 
predicate removes the placeholder values inserted into 
this case is the value "!Lowleve!!Lung Cancer* As des 
translations. 

For. ag^^re^atfon queries, it is also possible to 
thougti this is more difficult For dimensions with non-; 
is introduced that holds only the part of the normaiisec^. 
strictness. In the normalised hierarchy to the right in' 
Diagnosis category and the two special categories inlro{iuced 
to hold sets of diagnosis famines and sets of diagnosis 
hierarchy is implemented in the Diagnosis dimension 



are not returned. The not like 
ihe hteraicfiy to balance it, which in 
red, the result is uiiaffected by the 
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Table 4: Dimension and Group Table s for Aggregation 



30 The 'Lowlever column contains the normal textual 



transformation transparency, -al-. 
skrictness, a speoal dimension table 
.1 lierarchy that does not contain non- 
Igure jf.^this part is the ljOw*levBl 
by tfie normalisation process 
6roups, respectively. This part of the 
seen in Table 4. 



diagnosis description, whereas the 
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; grotps 



special "Family" and 'Group" columns contain comma-i 
sets of values that are represented by the column values 
the set {4,9}. 

We need to capture the remaining part of the hierarchy, 
5 pings from a -set-of-X" category to the "X" category, e.g.. 
Group* category to the "Diagnosis Groiip" category to thfe 
to 13 (Cancer) and {11,12} to 11 (Diabetes) and 12 {Pre<|i 
trodudng a speciai table tor each such mapping, named 
(for Set^f). For example, for the Diagnosis Group category, 
10 sets of diagnosis groups to the individual diagnosis 
represents the diagnosis group, vtfhile the "SGroup" 
(£agnosis groups. 

With these tables available, it is possible to obtain 
gation queries, A ROLAP aggregation query has the 
'15 "the'number of patients per diagnosis group 

SELECT Diagnosis .Group/ SUM < Patient -Count > 
FROM Diagnosis r Patient 

WHERE Diagnosis. DiagXD=»Pat ient -DiaglD 
GROUP BY Diagnosis. Group 

20 This Is transformed into the more complex query given n^xt 

SELECT SGroup . Group , SUM ( QQQQQQQ . Count ) 
FROM Sgroup, 
(SELECT Diagnosis . Group, 

SUM (Patient .Count) AS Count 
FROM Diagnosis r Patient 
WHERE Diagnosis. DiagID«Patient.DiagID 
GROUP BY Diagnosis. Group) QQQQQQQ 
WHERE QQQQQQQ. Groups SGroup -SGroup AND 

SGroup. SGroup MOT LIKE ' !%' 
GROUP BY SGroup. Sg roup 
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separated ordered lists of the IDs of tfie 
For example, value *4.9'' represents 



r, which consists of non-strict map- 
he mapping of the^set-of-Diagnosis 
right in FIgure/B, which maps {13} 
nancy Related). This Is done t>y in- 
t>y the category prefixed with an 'S* 
\ table "SGroup" in Table 4 maps 
in the sets. The •Group" column 
colu^nn represents the associated set of 



tra nsformation transparency for aggre- 
lorniat of the query t>elow that computes * 



The transformed aggregation query has two parts, 
putes the number of patients per set of diagnosis group, 
name qqqqqqq. This part of the hierarchy is a balancec 
pre-aggregated data for optimising the query performajice. 



The nested table expression corn- 
making this available via correlation 
tree, so the RDBMS can safely use 
Ttte result of the nested tat>le 
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expression is used in the outer query, which aggregates the last part of the way up to the 
diagnosis groups using the "SGroup" table. The outer query also filters out any placeholder 
values inserted by the normalisation process (prefixed with a "!"). As a result, the client OLAP 
tool will retrieve the expected result. 
5 Good query performance without the use of excessive storage for pre-aggregated data is 
obtained by using practical pre-aggregation for the "nice" part of the hierarchy captured in the 
"Diagnosis" dimension table. The query transformation exemplified here can be performed for 
all ROLAP aggregation queries, making the solution quite general. 

1 0 Incremental Computation 

When dimension hierarchies or fact data are updated, the transformed hierarchies must be 
updated correspondingly. One solution is to recompute the hierarchies using the new data. 
This straightforward solution is attractive when updating small dimension hierarchies that only 
change infrequently, or when large bulks of updates are processed. However, for massive 
1 5 hierarchies and frequent updates, and for updates of small parts of the hierarchies in general, 
it is desirable if the algorithms need only consider the changed parts of data, which will only be 
a small fraction of tine total data volume. This section briefly describes how to incrementalise 
the algorithms. 

In addition to modifying the transformed hierarchies, it is also necessary to update the 
20 actual pre-aggregated data when the underlying base data is modified. The modified hierar- 
chies resulting from the algorithms given in this section differ only locally from the argument 
hierarchies. This means that the cost of updating the pre-aggregated data will not be greatly 
affected by the hierarchy transformations. 

In the incremental algorithms, updates are modeled as deletions followed by insertions, so 
25 we consider only the latter two modification operations. We use prefix Ai to denote inserted 
values, Arf to denote deleted values, and A to denote all modifications. For example, A^C 
denotes the values inserted into C. The category and links tables in the algorithms refer to 
the states after modifications; and when a hierarchy value is deleted, all links to that value are 
also assumed to be deleted in the same set of modifications. 

Covering Hierarchies 

Modifications may render covering hierarchies non-covering in several ways. The the left- 
most table in Table 5, named "Covering" and discussed next, indicates whether an insertion 
("Insert") or a deletion ("Delete") on tiie different parts of the input to MakeCovering may 
35 render the modified hierarchy non-covering. 
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Table 5: Effects of Insertions and Deletions on the Covering, Onto, and Strictness Properties 



Problems may arise if links are inserted into Rc,h that are not covered by insertions into 
Rc^ and Rp,H, or if links are deleted in Rc,p or Rp,h, but the con-esponding C-to-H links are 
not deleted in Rc,b- « values are deleted in P or H, their links will be deleted too, which is 
handled by the case above. Values cannot be inserted into C without any links, as all values 
in the original hierarchy must at least be linked to the T value. 

The incremental version of MakeCovering algorithm starts by finding (in line (6)) the links 
L from C to if that are not covered by the links from C to .P and P to H. These links are 
used as the base for the rest of the transformation. Thus, line (6) of the algorithm becomes 
the following expression. 

L <- Aii2c,H Unc.H(AiiJc.P ^ RP,H) OUc.HiRc,P M AdRp,H)\ncjl{^iRc.P M AiRpH) \ ^dRc,H 

Onto Hierarchies 

The effects on the onto property of insertions and deletions are outiined in the middle table in 
Table 5. Insertion of values into P, deletion of values in C, and deletion of links in Rc,p may 
cause the hierarchy to become non-onto. The incremental version of the MakeOnto algorithm 
thus starts by finding (in line (4)) the "childless" values N from P with no children In C. As a 
result, line (4) of the algorithm becomes the following expression. 



N AiP U np(Az,i2c.p) \ np(AdP) \ nF(AiJJc,p) 
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Strict Hierarctiies 

The case of maintaining the strictness property of hierarchies is more complicated because 
a new category N is introduced by the algorithm. We assume that all new categories have 
already been created before the incremental algorithm is used, i.e., if non-strictness is intro- 

5 duced in new parts of the hierarchy, we have to recompute the transformed hierarchy. The 
introduction of non-strictness requires major restructuring of both the hierarchy and the pre- 
aggregated date, so this is reasonable. 

An overview of the effect on strictness of insertions and deletions in the input to algorithm 
MakeStrict is given in the right-most table in Table 5. !f links are inserted into, or deleted from, 

1 0 i2c,F or the links to N for the affected C, P, and G values must be recomputed. 

Insertions into, or deletion from, C, r, or G will be accompanied by corresponding link 
insertions and deletions, so they are handled by the above case. The incremental MakeStrict, 
given below, works by finding the affected C, P, and G values, then recomputes their links to 
N and deletes the old links, and finally inserting the new links. As before, it is followed by a 

1 5 step that deletes the disconnected parts of the hierarchy. 



(1 ) procedure lncrementalMakeStrict(C) 

(2) for each P e Pred(C) such that PTed{P) 7^ 0 do 

(3) begin 

(4) dC ^ Uc(ARc,p) 

(5) dRc,N ^ {(c, Fase{{p \ (c,p) e dC M i2c,p})} 

(6) dN ^ UNidRc^N) 

(7) N^NUdN 

(8) Rc,N ^ Rc,N \ {(c, n) I c € dC} U dRc^N 

(9) dP ^ np(ARc,p) 

(10) dRN,P ^ \ nedN ApedPn UnFtise{n)} 

(1 1 ) Rn,p Rn,p \ {(n,i>) i p € dP} U dRNj> 

(1 2) for each G e Pred{P) do 

(13) begin 

(1 4) dG ^ Ug(ARp,g U (dP ixi Rp^g)) 

(1 5) Rn^g ^ Rn,g {{n,g)\9G dG} U nAr,<7(-RjNr,p M Rp^G ^ dG) 

(16) end 

(1 7) lncrementalMakeStrict(iNr) 

(18) end 

(19) end 
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1 . A melhod for transforming a general on-line analyticc I processing dimension into an at 
least partly aggregation normalised dimension, i.e. a diriension with improved 
summarisabiiity, by means of a computer, the dimension having dimension values 
organised into categories of dimension values based or a partial ordering, the dimension 
10 comprising mappings of links between dimension value 5, the method comprising the 
steps of - . 

retrieve the mapping from data storage means e ssociated with the computer, 
analysing the mapping to determine irregularitie j of the dimension . i.e. features 
• rendering the dimension non-summerisabte. by means 3f analysing means executed by 
15 the computer, 

creating new dimension values of the dimensior 
between dimensional values of the dimension according to the analysis, whereby the 
dimension is at least partly aggregation normalised, and 

saving the new dimension values and the modified mappings in data storage 
20 means of the computer. 



and modifying the mapping 



2. A method according to claim 1. wherein the step of c 
and modifying the mapping comprises the steps of 

examine virhether the dimension is covering, i.e. 
25 child values can be related, as well as onto, i.e. that all 
lengths, and in case it is. 

executing a make-strict procedure for making th 
that each child in a hierarchy has only one parent, thereby 
dimension aggregation normalised, i.e. summarisable 



30 



3. A method according to claim 1 or 2, wherein the step 
values and modifying the mapping comprises the steps 
examine whether the dimension Is covering, and 
executing a make-onto procedure for making th^ 
35 partly making an non-onto dimension aggregation norm 



eating new dimensional values 



that only immediate parent and 
paths in the hierarchy have equal 

3 dimension aggregation strict, i.e. 
making the non-strict 



of creating new dimensional 
of 

in case it is. 

dimension onto, thereby at least 
alised. 
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4. A method according to any of claims 1-3, wherein th« 
values and modifying the mapping comprises the step 

executing a make-covering procedure for makin ) 
5 at least partly making a non-covering dimension aggreqation 



5. A method according to any of claims 2-4, wherein the ! make-strict procedure comprises 
the steps of, starting from the bottom category and sua ;essively proceeding towards the 
top category. 

1 0 identifying combinations of dimensional values ( if the same category for each of 

which combination at least one dimension value of a ca tegory below said category is 

linked to each of the dimension values of the combinati 

creating one or more new dimensional values each representing one of the 

identified combinations of dimensional values and crealing links from the new dimensional 
15 values to dimensional values of above categories in accordance with existing links from 

each of the dimensional values represented by the new < 

identifying dimension values being linked to ider itified combinations of dimensional 

values of the same category and replacing the links witii links to new dimensional values 

representing said combinations of dimensional values. 

20 

6. A method according to any of claims 2-5, wherein th^ make-strict procedure comprises 
the successive steps of 

(i) setting the bottom category of the dimension as 1 

(ii) for each category being a direct predecessor of the child category of which 

25 category at least one dimension value of the child catec lory is linked to a dimension value 
of, setting said category as the parent category and pei forming the steps of: 

(iia) ending the make-strict procedure for the | 
category is the top category of the dimension. 

(iib) ending the make-strict procedure for the | 
30 dimension value of the parent category is linked to a dii nension value of a higher 

category. 

(ric) creating a new fused category In the dirr ension immediately t>elow the 
parent category in case at least one of the dimension v ilues of the child category is linked 
to more than one dimension value of the parent catego y, 



step of creating new dimensional 



the dimension covering, thereby 
normalised. 



the child category, 



parent category in case the parent 



parent category in case no 
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(iid) for each dimensional value of the child category, perfomiing the steps of: 
creating a new dimension value of the new fused categ siy representing the one or more 
* values of the parent category to which the dimensional /alue of the child category is 
linked and creating links from said new dimension valu<! to said values in the parent 
5 category, the creation of the new dimension value beintj conditioned that no dimension 
value of the new fused category already exists having c xactly such Hnk(s). and 

for each category being a direct predecessor of the parent category of which 
category at least one dimension value of the parent cat sgory is linked to a dimension 
value of. setting said category as a grandparent catego^ and creating links from the new 
10 dimension value to the one or more dimension values cf the grandparent category to 
which said one or more dimensional values of the parei it category are linked, . . . ■ - 
(lie) removing the links from the parent categor f to the one or more grandparent 
categories, whereby the grandparent categories no lonjier are direct predecessors of the 
parent category, 

15 (iif) creating links from each dimensional value of the child category to the 

dimension value of the new fused category having the same links to the dimension values 
of the parent category whereby the new fused category becomes a direct predecessor of 
the child category, and removing the links from the dim jnsion values of the child category 
to the parent category, whereby the parent category no longer is a direct predecessor of 

20 the child category, 
and 

(iig) setting the new fused category as the criild category and returning to step 



(ii). 



25 7. A method according to any of claims 3-6. wherein th^ 

the steps of, starting from the top category and 

bottom category. 

creating, for each dimension value of each catebory 

not being linked to any dimerisional value of the category 
3D dimension value in the category immediately below anc 

dimension value and said dimension value of the categ 



make-onto procedure comprises 
successively proceeding towards the 



B. A method according to any of claims 3-7, wherein thf make-onto procedure comprises 
the successive steps of 
35 (i) setting the top category of the dimension as the parent category. 



above the bottom category 
immediately below, a new 
creating a link between said new 
3ry in question. 
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(fi) for each category immediately below the parent 
values being linked to dimension values of the parent 
the child category and perform the steps of 

(iia) creating, for each dimension value of the 
5 to any dimensional value of the child category, a new 

category and creating a link between said new dimension 
of the parent category, 

(iib) setting the child category as parent 

(iic) ending the make-onto procedure in case 
10 category of the dimension, else returning to step (ii) of 



category and having dimension 
; setting said category as 



[category, 



category. 



parent category not being linked 
diijnenslon value in the child 

value and smd dimension value 



Ihe parent category is the bottom 
make-onto procedure. 



the 



9. A method according to any of claims 4-8, wherein the| make-covering procedure 
comprises the successive steps of 

identifying links between dimension values of tw|> categories having at least one 
15 intermediate category there between. 

creating a new dimension value In each of said 
those links for which no paths of links exists going only 
links from lower to higher categories and including a lini 
intermediate category, and 
20 replacing those links with links between the 

the new dimension values. 



I itermediate categories for each of 
through immediate child-parent 
to a dimension value of the 



dime nsion values of those links and 



10. A method according to any of claims 4-9, wvherein 
comprises the successive steps of 
25 (i) setting the bottom category of the dimension as 
(ii) for each category immediately above the child 
between a dimension value of said category and a dime 
exists, setting the category as the parent category and 

(iia) ending the make-covering procedure for 
30 parent category is the top category of the dimension:, 

(iib) for each higher category being a direct 
category and being higher in the hierarchy than the par4nt 
of 

(iiba) identifying sets of dimension valu 
35 dimension values of the child category for which 



ttie make-covering procedure 



the child category, 
aitegory for which at least one link 
nsion value of the child category 
f lerform the steps of: 
the parent category in case the 

pi edecessor category of the child 
category, performing the steps 

3s of the higher category and 
sets 
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higher categories and including a 



a link exists, and 

no paths of lini<s going only from lower td I 
link to a dinnension value of the parent category exists, and 

(iibb) creating for each identified set of dimension values a new dimension 
value in the parent category, creating links between each of the dimension values 
of the set and the new dimension value, and removing the link between the two 
dimension values of the identified set, whereby t^e higher category no longer Is a 
predecessor of the child category, 
(lie) setting the parent category as the child c ategory and returning to step (ii). 



10 



11. A method for by means of a computer to at least pa tly aggregation normalise a 
multidimensional object including a set of facts comprising a plurality of fads mapped on a 
plurality of dimensions having dimension values organi: ed into categories of dimension 
values based on a partial ordering, the multidimensional i 
15 links between dimension values within each dimension, 



of any of claims 1-10 to at least one of the dimensions of the multidimensional object. 



12. A method according to claim 11. wherein the 
plurality of facts and the mapping comprises links from 
20 dimension value in each of the plurality of dimensions, 
layer of each of the dimensions of the multidimensional 



13. A method according to claim .11 or 12. comprising 
selecting a subset of categories of the one or 
25 normalised, and 

performing an aggregation normalisation of the 

whereby one or more of the dimensions of the 

aggregation normalised. 



object comprising mappings of 
by means of applying the method 



multidimensional object comprises a 
5ach of the facts to at least one 
the facts constituting the bottom 
object. 



the 



mi >re 



steps of 

dimension to be aggregation 



selected subset, 
multidimensional object is/are only partly 



30 14. A method according to any of claims 11-13. compris ing the steps of 

selecting specific aggregation functions to be performed on the multidimensional 
object, and 

selecting by means of the computer normalisation steps to be performed based on 
the selection of specific aggregation functions to be per brmed. 
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whereby one or more of the dimensions of the muttidime tnsionat object is/are only partly 
aggregation normalised. . 

15. A method for by means of a computer to at least paitly aggregation nomialise a 
5 general on-line analytical processing multidimensional c bject including a set of facts 
comprising a plurality of facts mapped on an aggregatio i normalised plurality of 
dimensions having dimension values being organised irto categories of dimension values 
based on a partial ordering, the multidimensional object comprising mappings of links 
between dimension values within each dimension. 
10 the method comprising the steps of 

retrieve the mapping from data storage means s ssociated with the computer, 
including the mapping of the plurality of facts on the multidimensional object into 
the mapping of the multidimensional object so that the napping comprises links from each 
of the facts to at least one dimension value in each of tl" e plurality of dimensions, and the - 
1 5 facts constitutes the bottom layer of each of the dimens ons of the multidimensional 
object, 

analysing the mapping of the multidimensional c bject to determine irregularities of 
the dimensions by means of analysing means executec by the computer. 

creating new dimension values of the multidimensional object and modifying the 
20 mapping between dimensional values of the multidimensional object according to the 
analysis, whereby the multidimensional object is at least partly aggregation nonmalised. 
and 

saving the new dimensions and the modified ms pping in data storage means of 
the computer. 

25 

16. A method according to claim 15, wherein the step o^ creating new dimensional values 
and modifying the mapping comprises the step of 

executing a make-strict procedure for making this 
aggregation strict, thereby making the non-strict multidimensional object aggregation 
30 normalised, the make-strict procedure being executed t in the condition that the 
multidimensional object is covering prior to the executic n. 

17. A method according to claim 15 or 16, wherein the ;tep of creating new dimensional 
values and modifying the mapping comprises the step of 



3 multidimensional abject 
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executing a make-covering procedure for makinj the multidimensional object 
covering, thereby at least partly making tlie non-coverin g multidimensional object 
aggregation normalised. 

I 

18. A method according to any of daims 15-17, whereir the method comprises the initial 
step of making each of the plurality of dimensions aggrcigatton normalised by means of 
the method according to any of claims 1*10. 



19. A method according to any of claims 11-18. whereir 
10 values are marked as such, a pre-aggregation is 

being normalised by means of the computer and the 

producing a reply to a query made to the systenr 

multidimensional object, aggregate queries, exploring 

as navigation queries, that summarise the data at 
1 5 the existence of the created new dimensional values is 



the created new dimensional 
performed on a multidimensional object 
method further comprises the step of 

and concerning the 
tfjie dimension hierarchies, as well 
vario^js levels of detail, in which reply ' 



transparent. 



20. A method according to any of claims 11-19, further 
Implementing, into the aggregation nonmalised 

facts including mapping of the facts onto the dimension 
20 dimensions, or of new mapping k>etween some of the 

implementation in^egutarities of the multidimensional ob 
analysing the introduced irregularities of the 

object, 

creating new dimensional values of the 
25 mapping between dimensional values of the 

analysis, whereby the multidimensional object is aggredation 

saving the new dimensions and the modified mapping 
the computer. 



30 21 . A computer system comprising at least one general 



storage means associated therewith on which data stor age means is stored a computer 



comprising the steps of 
rpulttdimensional object, of new 

of new dimension values of the 
dimension values, by which 
ect is introduced, 
dimensions of the multidimensional 



multidimjensional object and modifying the 
multidimensional object according to the 
normalised, and 
in data storage means of 



purpose computer having data 



[o perform an at' least partly 
according to the method of any of 



programme product suitable for adapting the computer 
aggregation normalisation of a multidimensional object 

claims 1 1-20, the computer system comprising means |for retrieving the computer 
35 programme product and perform accordingly. 
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22. A computer programme product suitable for adaptinc i 
perform an at least partly aggregation normalisation of a 
according to the method of any of claims 11-20. 



10 



23. A computer system for on-fine analytical processing 
associated therewith on which a multidimensional objecl 
object including 

a set of facts comprising a plurality of facts, 
a first plurality of dimensions having dimension 
categories of dimension values based on a partial orderi 
of liriks between dimension values within each dimension 
dimensions as well as links between the facts and the di 
dimensions, at least one of the dimensions of the first 
15 irregular, and 

a second plurality of dimensions having dimension 
categories of dimension values t>ased on a partial order ng 
mapping of links between dimension values within each 
of dimensions as well as links between the facts and the 
20 plurality of dimensions, each of the second plurality of d 
normalised. 

the computer system comprising a query handle 
producing replies to queries made to the computer syst( m 
multidimensional object, the replies to navigation queriep 
25 dimensions and the replies to aggregate queries being 
dimensions. 



having data storage mear\s 
is stored, the multidimensional 



24. A computer system according to claim 23, wherein 
relating to the second plurality of dimensions is further 
30 means and the replies to aggregate queries furthermore 
aggregation data. 



25. A computer system according to claim 23 or 24. 
is adapted for producing replies to aggregate queries in 
35 second plurality of dimensions is transparent 



a general purpose computer to 
multidimensional object 



vialues being organised into 
ng and comprising a first mapping 

of the first plurality of 
mensions of the first plurality of 
pi jraiity of dimensions being 

values being organised into 
and comprising a second 
dimension of the second plurality 
dimensions of the second 
mensions being aggregation 

' component being adapted for 
and concerning the 
being based on the first set of 
ibased on the second set of 



set of pre-aggregation data 
sjtored within the data storage 
are based on the set of pre- 



wherein the query handler component 
which replies the existence of the 
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26. A computer system according to claim 25, wherein 
adapted for transf omr^ing aggregate queries made to the 
queries for the second set of dimensions and transfomfiiig 
5 set of dimensions into replies as based on the first set 
existence of the second plurality of dimensions transparjent 



27. A computer system according to claim 26. wherein 
stored within the data storage means of the computer 
10 combination of star schemes for the part of the 

strict mappings, and additional tables containing the 
query handler component makes use of said tables in 



tnei 



system i 



multidimensional object is 
In tables organised as a 
multidimbnsional object containing only 
non-strict part of the mappings, the 
tr ansforming queries and replies. 



28. A computer system according to any of claims 23-2 ' 
15 adapted for peiforming an at least partly aggregation nc 
object according to the method of any of claims 1 1-20, 



query handler component is 
first plurality of dimensions into 
replies based on the second 
dimensions, thus making the 
in the produced reply. 



further comprising means 
rmaiisation of a multidimensional 
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(57) Abstract: A method, a computer system and a computer programme product for a computer system for iransfonniBg general 
On-line Analytical Processing (OLAP) hierarchies into sununarisable hierarchies whereby pre-aggregation is disclosed, by which 
fast query response times for aggregation queries without excessive storage use is made possible even when the hierarchies originally 
are irregular. Pre-aggregation is essential for ensuring adequate response time during data analysis. Most OLAP systems adopt the 
practical pre-aggregation approach, as opposed to full pre-aggregation, of materialising only select combinations of aggregates and 
then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension 
hierarchies and the relationships between facts and dimensions satisfying saingent constraints. The present invention significantly 
extends the scope of practical pre-aggregation by transforming irregulare dimension hierarchies and fact-dimension relationships 
into well-behaved structiu'es that enable pracdcal pre-aggregaiion. 
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